The MapR CTO/Co-Founder on MapR-DB and Project Kudu – Whiteboard Walkthrough

Editor's Note: In this week's Whiteboard Walkthrough, MC Srivas, MapR CTO and Co-Founder, explains the innovation and vision behind MapR-DB and how project Kudu stacks up to the MapR Data Platform.

Here's the unedited transciption: 

Hello, everyone. I'm M.C. Srivas, CTO and Co-Founder of MapR. I'm here to talk about MapR-DB and what's the vision we had when we went and build this entire platform at MapR. If you look at big data, big data is really two things. There's big storage and then there's big processing. When you say big storage, what are we really looking at? Right? If you look at Hadoop has now become the defacto standard for big data but let's look a little bit one level inside that big Hadoop picture.

What we have from MapR is MapR-DB which is a fantastic system for managing tables and files in the same platform. On top of MapR-DB you can run Spark, Hadoop, SQL, and your own customer applications. This is really what we were trying to build or what we have built I should say. What was the approach we took? The approach we took was let's try to innovate while we keep the best ideas that others have innovated before us. We want extreme scale clearly this is about big data so extreme scale where trillions of rows, millions of tables, thousands and thousands of columns is very important.

Beyond columns document model is the right way to do things. Document model from a level of perspective is much more easier to handle with. It's more natural and it's the way people think. Rows and columns are just an approximation to the document model. In MapR-DB we have the first time we've introduced JSON to Hadoop. When you talk about data consistency and so on, acid which is atomic, consistent, isolated, and durable are extremely important features and MapR-DB provides acid features.

Very importantly, today in this IOT scale of things you don't work in a single data center anymore. Everything is globally replicated across the world and MapR-DB naturally fits there. You can run MapR-DB tables worldwide with full multi-master application. More importantly we borrowed an idea like I said we innovate and while keeping what good ideas are we borrowed a great idea from the cellphone industry which is zero management.

You just start MapR-DB. Start using it and it just works. There's nothing to tune. There's nothing to configure. It just works. It runs on commodity hardware. You don't have to go and make a decision about what hardware you want right now. You can upgrade continuously because MapR-DB approach heterogeneous hardware. You can keep your old hardware and make it work with your new hardware. You don't require every inordinate cluster to be identical.

While doing all these we give you 10x their performance of the newest competitor that's out there. The things that we left behind which when we started building this, you said, "Hey, what are some ideas which we didn't want to take along with us?" If you're doing databases typically you're probably used to everything being a transaction. Right? Everything being a transaction is what causes performance problems with Oracle and other databases. With MapR-DB, you choose when you want to have a transaction or not have a transaction.

You know everything doesn't have to be acid. Rigid schemas are really a thing of the past. I mean, today with the enormous amount of semi-structured data around these are what I mean by semi-structured is loosely loose schemas like for example email has a schema where you have a from and a to and a date and a subject. The body is loose. The subject is loose. The list of recipients is loose. It's not really rigid schema. We don't have foreign key constraints and things like that. There's no concept of an inconsistent email. Emails are always consistent even if they don't have rigid schemas.

Trying to equate rigid schemas with consistency has been a mistake in the past and we didn't take that forward. Very, very importantly we've done commodity hardware. The appliance approach in the belief that it's going to give a better performance is actually a fallacy. What we have found is that when you do an appliance you're really buying. When you're buying an appliance you're really buying yesterday's hardware in today's prices.

While we at MapR can give you the latest with MapR since we run commodity hardware you can take advantage of hardware as it improves and not worry about having to do a full upgrade every time. Having to talk about MapR-DB, I was asked to actually compare MapR-DB with Apache, Cloudera, Kudu that was recently released. If you look at this diagram here, this is a diagram that Cloudera has put up where they have created this graph where they show slow and fast on random IO and then slow and fast on streaming IO. They placed Kudu in the middle between HBase and HDFS. MapR-FS and MapR-DB have already existed now for the ...

FS has been around for six years and DB has been around for four years. If you look at some of the performance numbers we've published in the last few years, we're about 20x faster than HDFS. In fact, Samsung recently published performance number where HDFS does about 500 megabytes per second while MapR-FS has a world record setting 16 gigabytes a second, the fastest on the planet today. Similarly, with HBase compared to MapR-DB the random IO performance is almost 20x faster. I hope you try out MapR-DB. It's free, it's available now in our M3 distribution. Give it a try and see. You'll really love it. Thank you very much.



Driving The Next Generation Data Architecture with Hadoop
This paper examines the emergence of Hadoop as an operational data platform, and how complementary data strategies and year-over-year adoption can accelerate consolidation and realize business value.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free