Don’t buy the elixir of youth: Machine learning is not magic

Tue, 18th Apr 2017

FYI, this story is more than a year old

By Ondrej Kubovič, Jakub Debski and Peter Kosinar

If someone told you they had a magic elixir that would heal all your illnesses or injuries, and make you young again, would you believe them? No matter how medieval this marketing trick might sound, it is still in use, even in the data-driven 21st century.

Nowadays, it's not street vendors selling the elixir of youth anymore. They have been replaced by an array of “post-truth” cybersecurity companies offering mysterious artificial intelligence (AI) and machine learning (ML).

These technologies, you are told, will keep your business safe from any malware and other threats – regardless of whether it's been seen before, or is completely new. But, of course, these techniques are way too complicated to explain or properly understand. They're almost magical.

Back here on Earth, we can report that there is no magic behind AI or machine learning. The former term has been around for more than 60 years and represents the ideal of a generally intelligent machine that can learn and make decisions independently, based only on inputs from its environment – all without human supervision.

A step back from this as-yet unachievable AI dream, there is machine learning, a field of computer science that gives computers the ability to find patterns in huge amounts of data, by sorting them and acting on the findings. The concept might be a little newer, but it still has been present in cybersecurity since the 90s.

If you feel lost and can't relate, just remember when Facebook found your face in that party photo? That was machine learning. Or when Netflix suggested a great movie? Also ML.

In cybersecurity, machine learning mostly refers to one of the technologies built into a solution that has been fed large amounts of correctly labeled clean and malicious samples, and has learned the difference. Thanks to this training – also known as supervised machine learning – it is able to analyze and identify most of the potential threats to users and act proactively to mitigate them.

Automation of this process makes the security solution faster and helps human experts handle the exponential growth in the number of samples appearing every day.

Algorithms that lack this training – which fall into the category of unsupervised machine learning – are almost useless for cybersecurity.

The reason is that they sort the data into their own categories, which don't necessarily distinguish between clean items and malware and are instead better suited to finding similarities or anomalies in the dataset invisible to the human eye.

At ESET – an established cybersecurity vendor with almost three decades of experience – we have been applying supervised machine learning for years. We call it “automated detection”.

To keep our detection rates high and false positives low, a team of experienced human supervisors evaluate items that are too divergent from other samples, and hence hard for ML to label. This approach allows us to avoid the pitfalls of false positives or misses that might occur on the way to a fine-tuned algorithm that works well with other protective technologies under the hood of our solutions.

So, to wrap up – there is no magic in machine learning. It is a well-established technology which – under human supervision – learns how to extract features and find specific patterns in huge quantities of malicious and clean data, and which already helps us to protect millions of ESET users worldwide for years.