Hadoop in 5 Minutes or Less

Five minutes is easily squandered without much thought; however with Hadoop, five minutes can make a big impact.  John Schroeder, MapR CEO and Founder, recently used a five-minute keynote address to illustrate this point.  

Following is an edited transcript of John's message. 

We have a short period of time so we thought what might be interesting is talk about what you can do with Hadoop in 5 minutes or less. We're going to start out with a few use cases. We looked to our customer base and the first one we'll talk about in the first minute is we'll complete 4.73 million authentications. That's part of the Aadhaar project over in India where the Aadhaar is providing a unique identifier for every resident of India, so that's 1.2 billion residents.

The idea there is to make it easier to open a bank account, and mobility of the population. It also can decrease the embezzlement of government subsidies by as much as 1.3 billion dollars. It's a great project!

It's a biometric database, so it includes an iris scan, digital fingerprints, a digital photo, and then text-based data for every resident. We've got somewhere around 435 million residents in there and there's a storage component for storing all that data. There's analytics that are doing things like comparing every digit to every other digit to make sure there's not a fraudulent ID being added to the system. In this case, frequently we have to do as many as 4.73 million authentications in a single minute, because if somebody's going to an ATM, you need to be able to respond to that in 200 milliseconds or less.

What we'll do in the second minute here, is really around health care. If you look at doctors, especially oncologists, they've got a very difficult job understanding a patient's genetics, understanding their symptoms, and what other treatments are being done for that particular patient. In the area of preventative medicine a lot around genetic sequencing. Again, we've got a customer who's up and running and they can do 422.2 thousand genetic sequences in the second minute here.

In the third minute, you heard Jeffrey Moore talk about Google really revolutionizing advertising. And with regards to the Rubicon Project, they get it. They're being wildly successful doing this.

I saw Jan whose one of the leaders at Rubicon last night at our cocktail party and he was very happy they filed their S1 a couple weeks ago to IPO. They're running 90 billion ad auctions and they've got 500 of the premier publishers, over 100,000 of the top brands in the world. In a short period of time, in this third minute, they can run 63 million ad auctions. Rubicon reaches about 96% of the US audience, and the way we know that is through comScore. comScore is basically monitoring every single web interaction. For comScore, they can do 39 million events in a minute.

This leads us to our next one. If you watched the Super Bowl you saw ads for Dr. Dre's Beats Music, a new music service, in a very competitive marketplace. You’ve got Spotify, Pandora, iTunes of course.

The way Dr. Dre is going to win in that marketplace is by providing you with much better personalized music, by looking at and analyzing over 20 million songs. Today, even with the early launch of beats, they're doing 129 recommendations to music lovers in this second minute.

If you swiped your credit card to use beats, most likely your credit card provider, in this case a large credit card company with over 100 million cardmembers. They're protecting you from fraud, but also they know a lot about you because of the credit card transactions you're completing and they may have served you a coupon or an offer for a discount on Beats.

In the final minute here, let's sort data. MinuteSort is a benchmark that's been used to measure performance of technology. It's a technology agnostic benchmark, meaning you can throw any sort of hardware or software at it that you'd like. We set a record here with a 1.65 terabyte minute sort, meaning that's the amount of data you can sort in a minute.

To give you a feel of how that works, the previous record holders were Yahoo! and Microsoft. One of our customers installed us on 298 computers and that's represented by the blue dots at the bottom. They beat that record, they upped that record from 1.6 to 1.65. You can see the larger hardware footprint above us.

Thanks for taking a bit of time to hear what you can do with Hadoop in 5 minutes or less. Visit our booth and take a look at the MAPR Sandbox. Thank you much – have a great conference!

 
no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free