Big Data All Stars:

Real-World Stories and Wisdom from the Best in Big Data

Big Data All Stars: Real-World Stories and Wisdom from the Best in Big Data

Introduction

Those of us looking to take a significant step towards creating a data-driven business sometimes need a little inspiration from those that have traveled the path we are looking to tread. This book presents a series of real-world stories from those on the big data frontier who have moved beyond experimentation to creating sustainable, successful big data solutions within their organizations. Read these stories to get an inside look at “big data all-stars” who have been recognized by MapR and Datanami as having achieved great success in the expanding field of big data.

Use the examples in this guide to help you develop your own methods, approaches, and best practices for creating big data solutions within your organization. Whether you are a business analyst, data scientist, enterprise architect, IT administrator, or developer, you'll gain key insights from these big data luminaries – insights that will help you tackle the big data challenges you face in your own company.

Table of Contents

How comScore Uses Hadoop and MapR to Build its Business
Michael Brown, CTO at comScore
comScore uses MapR to manage and scale their Hadoop cluster of 450 servers, create more files, process more data faster, and produce better streaming and random I/O results. MapR allowes comScore to easily access data in the cluster and just as easily store it in a variety of warehouse environments.

Making Good Things Happen at Wells Fargo
Paul Cao, Director of Data Services for Wells Fargo’s Capital Markets business
Wells Fargo uses MapR to serve the company’s data needs across the entire banking business, which involve a variety of data types including reference data, market data, and structured and unstructured data, all under the same umbrella. Using NoSQL and Hadoop, their solution requires the utmost in security, ease of ingest, ability to scale, high performance, and – particularly important for Wells Fargo – multi-tenancy.

Coping with Big Data at Experian – “Don’t Wait, Don’t Stop”
Tom Thomas, Director of IT at Experian
Experian uses MapR to store in-bound source data. The files are then available for analysts to query with SQL via Hive, without the need to build and load a structured database. Experian is now able to achieve significantly more processing power and storage space, and clients have access to deeper data.

Trevor Mason and Big Data: Doing What Comes Naturally
Trevor Mason Vice President Technology Research at IRI
IRI uses MapR to maximize file system performance, facilitate the use of a large number of smaller files, and send files via FTP from the mainframe directly to the cluster. With Hadoop, they have been able to speed up data processing while reducing mainframe load, saving more than $1.5 million.

Leveraging Big Data to Economically Fuel Growth
Kevin McClowry, Director of Analytics Application Development at TransUnion
TransUnion uses a hybrid architecture made of commercial databases and Hadoop so that their analysts can work with data in a way that was previously out of reach. The company is introducing the analytics architecture worldwide and sizing it to fit the needs and resources of each country’s operation.

Making Big Data Work for a Major Oil & Gas Equipment Manufacturer
Warren Sharp, Big Data Engineer at National Oilwell Varco (NOV)
NOV created a data platform for time-series data from sensors and control systems to support deep analytics and machine learning. The organization is now able to build, test, and deliver complicated condition-based maintenance models and applications.

The NIH Pushes the Boundaries of Health Research with Data Analytics
Chuck Lynch, Chief Knowledge Officer at National Institutes of Health
The National Institutes for Health created a five-server cluster that enables the office to effectively apply analytical tools to newly-shared data. NIH can now do things with health science data it couldn’t do before, and in the process, advance medicine.

Keeping an Eye on the Analytic End Game at UnitedHealthcare
Alex Barclay, Vice President of Advanced Analytics at United Healthcare
UnitedHealthcare uses Hadoop as a basic data framework and built a single platform equipped with the tools needed to analyze information generated by claims, prescriptions, plan participants, care providers, and claim review outcomes. They can now identify mispaid claims in a systematic, consistent way.

Creating Flexible Big Data Solutions for Drug Discovery
David Tester, Application Architect at Novartis Institutes for Biomedical Research
Novartis Institutes for Biomedical Research built a workflow system that uses Hadoop for performance and robustness. Bioinformaticians use their familiar tools and metadata to write complex workflows, and researchers can take advantage of the tens of thousands of experiments that public organizations have conducted.