Interview: Cloudera talks security, open source machine learning and big data
Cloudera is a big data platform that supports machine learning and analytics for all areas of a business.
We sat down for a Q-A with Rocky DeStefano, Cloudera's subject matter expert for cybersecurity. He works with customers and partners on best practice cybersecurity strategy.
"Any time we have intelligence being created, whether by a machine or by a human, there's value in that intelligence. Anything that drives us to be smarter as humans. The same thing happens in machines.
"Because it's innovation and it's valuable, it needs to be protected. Cybersecurity will always be part of that. Any strategy that goes forward without cybersecurity as a core concept is one that is going to need to be reevaluated quickly.
We've had major cyber attacks that have hit the headlines – do you think that organisations started paying attention when attacks such as WannaCry became such a big deal? Did they wake up and realise 'hey, we really need to do something about cyber attacks because they can happen to anyone?
"I've been in this space since the 1990s. There's always an event that makes the news and makes people aware that it's still a problem. With the leak of hacking tools, it gets easier to conduct more advanced attacks on people.
"More people are aware that this is a problem but it has been a problem for a very long time. It's good that they take this seriously and reinvigorate their efforts.
Do you find that smaller organisations ignore cybersecurity until a big event happens – or are they on top of things – they know that attacks can happen and they're proactively forming strategies?
"The smaller organisations are focused on their core business of adding value to their customers. Larger organisations understand that part of doing business is protecting customers, data, employees and partners. In order to do that, they have built in security strategy.
"Smaller companies can take advantages of services that cybersecurity companies offer to help them mitigate those risks. They're becoming more aware that they have to do that.
We have digital transformation, which in part means moves to public, private and hybrid cloud, as well as their on-premise data. Do you believe the security requirements for each one differ, and in what ways?
"What becomes different is that in order to gain cost efficiency from the clouds, you wind up with data that's co-mingled. That data has to be categorised properly, that authentication, authorisation access is enabled properly, that the data is encrypted and that the encryption is managed properly.
"Those are aspects that in enterprises are often overlooked. You can look at the evidence of any credit card breaches that have occurred – there's database access and the ability to extract unencrypted information out of those databases. In the move to cloud, that information has to be protected because it's going to be of such large value. But the impact is far greater across the entire business.
"Providers are aware of this and they're making it easier. That's one of the things that Cloudera offers – 'let's make this secure from the starting point.'
How does that apply to big data? What implication does big data have for security when there are huge influxes of raw data are coming into organisations?
"We look at big data and big data platforms as the future of cybersecurity. The only way we can effectively manage cybersecurity analysis across all of the data in enterprise feeds (endpoints, networks, mobile devices, software-defined computing and storage) – is by applying machine learning and advanced analytics techniques against that. This is where big data platforms excel.
"Part of what Cloudera is doing is working with special resource projects like one called ApacheSpot to create open data models that multiple vendors can access that data model and then write applications and analytics against it. This can make it easier to deploy analytics against large volumes of data.
You're using open source technologies such as ApacheSpot towards machine learning. As a community-built project, what made you take this approach instead of developing something in a private lab?
"We understand that cyber threats are fast and various. They evolve moment to moment. There's no way and single vendor can keep pace with them. We keep learning from changes that occur day-to-day. What we said was 'let's build a foundation where everyone can collaborate together'. Within that collaboration, we can innovate faster as a community. We can innovate on access to the data, and providing machine learning that everyone can benefit from.
"In the ApacheSpot community we provide methods so that everyone has a starting point for machine learning in cybersecurity. They can enhance and continually evolve it along the way instead of sharing threat intelligence by itself. We can actually share techniques.
The GDPR is coming in and Australia has data breach notification laws – how will they affect the ways organisations store and manage customer data?
"These regulations offer a lot of focus on privacy. In order for an organisation to comply with privacy and still secure it, they need to fully comprehend the data they are storing and managing. I think it's going to force a lot of maturity in organisations to deal with changes that historically, they haven't had to deal with.
"It will ultimately increase the baseline for security and privacy. But in the short term, there are a lot of roadblocks to reach that level of maturity. Smaller organisations are going to struggle; larger organisations have to move faster than they're accustomed to. GDPR is already in effect. It starts in May 2018 but compliance with GDPR is ongoing.
"This is a current problem not a future problem. It's going to require better data classification better access controls, improving encryption and lineage of all stored data.