Using data science to improve threat prevention
FYI, this story is more than a year old
It’s becoming a non-negotiable fact that businesses need to use more data to strengthen their cybersecurity measures, which is why a data science approach to security could help organisations detect even the most subtle malicious activity.
According to security firm Palo Alto Networks, there are benefits and challenges to using a data science approach, but organisations will still be able to leverage data to outsmart cybercriminals.
Palo Alto Networks A/NZ systems engineer manager Mauricio Sabena explains: “Traditional security tools can’t keep up with the volume and pace of attacks today. With a large amount of good quality data and strong algorithms, companies can develop highly effective protective measures.”
“Data science and automation removes the need for IT security professionals to manually investigate every piece of suspicious activity by culling a high number of alerts down to a manageable number of quality alerts that can be actioned in a timely fashion.”
There are four key requirements for a data science approach to cybersecurity, according to Palo Alto Networks.
1. The right amount of quality data. Applying machine learning to data to automate decision-making is an ideal way to combat threats but, if the data isn’t accurate, up to date, or comprehensive enough, the machine won’t learn effectively and the approach won’t work. Likewise, security information and event management (SIEM) platforms aren’t built with the massive computing power that’s required for big data analysis. Running algorithms on big data lakes becomes difficult and costly, and it’s harder for businesses to manage these projects in-house.
Cloud-based solutions can address this challenge because it’s easier to manage resources effectively and elastically in the cloud. Furthermore, customers will depend on security vendors that have huge amounts of high-quality data already, and will let customers run their algorithms on that data. Most security teams only have access to a few weeks of historical data; a vendor-enabled approach will overcome this challenge.
2. Sophisticated algorithms. Data science and machine learning rely on human-made algorithms. These algorithms need to be strong to deliver desirable outcomes. It’s important to put the data in context by looking at all apps, users, and content. This leads to the best quality data. It’s impossible to identify every malicious activity in isolation. Leveraging large amounts of good quality data teaches the machine what’s normal and abnormal. This makes it easier to detect malicious attackers in the network even if they’re exceptionally stealthy.
3. An open mind to false positives. Tuning processes to stop every threat often results in a high number of false positives that must be investigated, leading to unnecessarily-high workloads. Conversely, reducing the number of false positives may result in some attacks getting through. But, with the right data and algorithms, it is possible to lower the number of false positives and get more accurate alerts.
Some attackers hide their activity by connecting via a non-suspicious method, using valid credentials and taking information that’s on the server they have access to. In isolation, those activities wouldn’t trigger an alert. But, by leveraging large amounts of data and understanding what’s normal for that specific user, businesses can see what activity is abnormal. By using a data science approach, the machine would identify that activity as suspicious so it could be investigated.
“With data science, security professionals no longer have to deal with a massive number of alerts, many of which could be false positives,” says Sabena.
“Instead, those alerts can mostly be dealt with by the automated processes, distilling the alerts that need to be investigated into a manageable number. This avoids the issue that can arise when there are so many alerts that customers don’t know where to focus first.”
4. Historical records. When it comes to applying data science, historical information is essential. In general, most businesses keep a few weeks’ worth of alert logs, especially if they receive thousands of alerts every day. However, it would be more useful to retain six or seven weeks of data to provide enough of a baseline to determine what activity is normal and what isn’t. Then, when each alert is generated it can be actioned quickly and the security team won’t be overburdened with alerts.
“Organisations shouldn’t overlook the value of a data science approach to security because it can dramatically reduce the workload involved in keeping an organisation secure. For the best chance of success, organisations should look to partner with an expert in using data science, to leverage vast amounts of data for the best cybersecurity outcomes,” Sabena concludes.