Big data and Hadoop-based approaches are now widely recognized but are still considered by many to be new technologies. The potential benefit of these approaches already is clear, but are they able to deliver practical value now?
To answer such a question it’s always useful to hear real world experiences that take us beyond the theoretical. That’s what came to light in an interview with Erwan Le Doeuff, VP of Information Technology, Risk and Security for Morgan Stanley at a recent Big Data Everywhere conference in New York City. And the answer was a resounding “yes”.
What is Morgan Stanley's big data and Hadoop story?
In addition to heading the security product team responsible for custom development and integration, Erwan Le Doeuff works with the security data repository at Morgan Stanley that runs on an Hadoop platform. The group started out by using Hadoop as an archiving system so they could retain a large volume of structured and unstructured data in an efficient way.
Not only was there a need to persist large amounts of data such as event logs in this archive, but the team also needed to have convenient access to the data, so the system was built with efficient search capability. And given the sensitive nature of the data, strong security controls were in place from the start. Security included audit capabilities, being able to detect tampering and being aware of who was looking at what.
Once the data repository was successfully established, more groups began to see ways their work could benefit from using it. Over time the uses for this system expanded to include more sophisticated computations and analytics including extracting and analyzing data for reports. Now machine learning approaches are also being applied.
Figure: Hadoop use case evolution. When you first begin to store data from a variety of sources on the Hadoop platform, you may not yet know all the ways in which different groups will want to use it. The flexibility of these systems let you expand uses easily.
Big wins with Hadoop platform and ecosystem tools
Archive more data for longer term
Immediately there was a win for Morgan Stanley based on Hadoop’s ability to scale, particularly in a cost-effective way. They not only needed to collect large amounts of data but also to save it for longer times. This approach set them up to handle their current projects and to be ready for future needs. One reason is the advantage that big data and Hadoop offer in fighting cyber threats.
Working with a large volume of data from many sources and having the option to look back over a period of 3 – 5 years, it is possible through sophisticated anomaly detection models to identify known and new cyber threats. This use case is not unique to Morgan Stanley. In a recent eSecurity Planet article “Using Hadoop to Reduce Security Risks” MapR Chief Application Architect, Ted Dunning, explained how Hadoop technologies improve an organization’s ability to detect attacks and reduce risk.
The benefit based on efficient scaling is growing rapidly, both as the amount of security related data grows and as a fresh range of cyber attacks appears. Erwan reported that the amount of data of interest that is generated may increase by up to ten times in just two years. It’s important to be able to scale out easily and quickly.
Multiple data centers
In addition to scalability, it’s also important for the data platform to support reliable work with high availability across multiple data centers, particularly when working with highly valuable data. This ensures that the data will always be available even if something happens to one center. The Hadoop distribution was well designed to meet these requirements.
One of the wins that Erwan reported with their big data system is the flexibility that Hadoop offers. This benefit shows up in several different ways. In terms of sizing the cluster, it is not necessary to know in advance exactly what size system you will need. It’s easy to scale as the need arises. Erwan explained that they started with a four node cluster and expanded to twenty-four nodes for the security projects he leads.
Flexibility is also a benefit in not having to know exactly what you will do with data in future applications. At Morgan Stanley they collected security data that needed very little data modeling before it was stored in a reliable way on the Hadoop platform. With good options for search and extraction, the data could subsequently be used in a variety of ways.
This freedom through flexibility also paid off in that their work is a mix of known road-map projects and new situations that cannot be defined in advance – that’s natural when working with security and threat prevention. You don’t always know ahead of time what the bad guys are planning, so you need to be able to respond to threats quickly and in creative ways.
When is the right time to start?
The Morgan Stanley story reflects the experience of many other organizations that started early with Hadoop. Their ways of using the new technology evolve, but as Erwan said, “…you have to start somewhere”. There is an advantage to getting started – you begin to build experience and to build your repository of critical data. That starts the clock on giving you the years of insight you may need for some situations.
What is the Morgan Stanley team likely to encounter in the near future? The prediction is to be working with petabytes of data in real time, using a variety of ecosystems tools.
For a free ebook on how other people are using Hadoop across many use cases: Download Real World Hadoop by Ted Dunning and Ellen Friedman.