The first day of the 2014 Hadoop Summit was filled with announcements and interviews. MapR announced our first Apache Hadoop App Gallery, as well as our exciting partnership with Syncsort. Jack Norris, MapR CMO, had a chance to talk about this news on theCUBE with Wikibon’s Jeffrey Kelly and SiliconANGLE’s John Furrier. Below are some of the highlights of the interview, which focused on MapR and its place in the Hadoop market.
View the entire interview with Jack Norris on theCUBE here.
John: Security and enterprise-grade Hadoop: that’s the buzz of the show. Hortonworks announced a big acquisition, and Cloudera followed suit with their news. How do you view that news? You guys already have the security stuff nailed.
Jack: If you look at the Hadoop market, it’s definitely moving from a test/experimental phase into a production phase. We have big customers across verticals that are doing some really interesting production use cases. We recognized very early on that in order to meet the needs of customers, we needed to make some architectural innovations. By combining the Hadoop ecosystem of packages with innovations underneath, we are able to deliver key features such as high availability, data protection, and disaster recovery. Security is part of that, but if you can’t protect the data, and if you can’t have multi-tenancy and separate workflows across the cluster, then it doesn’t matter how secure the data is.
John: You guys have been very successful, yet people look at MapR as the quiet leader. Explain your business model, and specifically talk about the traction, because you have paying customers.
Jack: We have over 500 paying customers. We have at least one “million-dollar customer” in seven different verticals, so we’ve got breadth and depth. Our business model is simple: we’re an enterprise software company that’s focused on providing the best of open source as well as innovations underneath.
John: You provide the most open distribution of Hadoop, but you add that value separately to that. So it’s not that you’re proprietary at all, right? Can you clarify that?
Jack: If you look at this exciting ecosystem, Hadoop is fairly early in its life cycle. If it’s in a commoditization phase like Linux or in a relational database system with MySQL, open source equates the whole technology. Here at the beginning of the Hadoop life cycle, there are some architectural innovations that are really required. If you look at Hadoop, it’s an append-only file system relying on Linux, and that really limits the types of operations and use cases that you can do. What MapR has done is provide deep, architectural innovations, and provide a complete read/write file system to integrate data protection with snapshots, mirroring, etc. So there’s a whole host of capabilities that enable easy integration, enterprise security, and better scalability.
Jeff: Do you feel like you were early to the market? We’re at a tipping point where we see more and more deployments, and we’re at a point where security is becoming increasingly more important.
Jack: I think our timing has been spectacular, because we came out at a time when there were some customers that were really serious about Hadoop. We were able to work closely with them and prove our technology. Now, as the market is starting to ramp up, we’re here with all of those features that they need. What’s an issue is that an incremental improvement to provide those key features is not really possible if the underlying architecture isn’t there. It’s hard to provide online, real-time capabilities on an underlying platform that’s append only. So the HDFS layer written in Java relying on the Linux file system is the “weak underbelly” of the ecosystem. There are a lot of important developments happening, such as YARN, and a lot of other exciting things that we’re actively participating in, such as Apache Drill. And on top of a complete read/write file system, and an integrated Hadoop database – MapR just makes it all come to life.
Jeff: When we asked the Wikibon community to name the things that were holding them back from deploying Hadoop in production, the biggest things that they cited were high availability, backup and recovery, maintaining performance, and scalability. That’s what MapR has been focused on since day one.
Jack: Yes, that’s true. We have a major retailer customer that has 2,000 nodes on MapR, with 50 unique applications running on a single cluster, running 10,000 jobs on top of that every day. Another customer, the Rubicon Project, has 100 billion ad auctions a day running on the MapR platform. Beats Music is using MapR to scale and personalize their music service. So there are a lot of proof points in terms of how quickly we scale, the enterprise-grade features that we provide, and the blending of deep, predictive analytics in a batch environment with online capabilities.
John: Can you give us an update on your relationship with HP?
Jack: It’s excellent. In fact, we just launched our App Gallery, making it very easy for administrators, developers, and analysts to get access to apps and understand what’s available in our ecosystem. One of the featured applications in the gallery is an integration with the MapR Sandbox and HP Vertica, so you can get early access, try it out, and get the best of enterprise-grade SQL on top of MapR.
Jeff: There’s some confusion about the different methods for applying SQL on Hadoop. I know that MapR takes an open approach: you support projects like Impala, etc. Talk about that approach from the MapR perspective.
Jack: Our perspective is “unbiased open source.” We don’t try to pick and choose the “right” open source project based on our participation or community involvement. The reality is, with multiple applications being run on the platform, there are different use cases…whether it’s a Hive solution, Drill, or HP Vertica – people have the choice. It’s part of a broad range of capabilities that you want to be able to run on the platform for your workflows, whether it’s SQL access, or MapReduce, Spark, Shark, etc.
Jeff: Do you think we’ll have this many options a year or two years from now?
Jack: I think the major difference is how these ecosystem projects can deal with the new data formats – can they deal with self-describing, data sources, can they leverage a JSON file, do they require a centralized metadata? Those are some of the advantages that Apache Drill has - to expand the data sets that are possible, and to enable data exploration without dependency on an IT administrator to define that metadata.
Jeff: Moving workloads from existing systems to Hadoop is one of ways people get started with Hadoop. Talk a little about your partnership with Syncsort, and why that makes sense for MapR and your customers.
Jack: It’s a great proof point. We announced that partnership around mainframe offload capabilities. We talked about comScore and Experian in our press release. If you look at a workload on a mainframe that is moving to Hadoop, that seems like an oxymoron. But having the capabilities that MapR has, and making that a system of record with that full high availability and data protection, we’re actually an option now for offloading data from a mainframe. We’re able to provide a cost-effective, scalable alternative. We have customers that had tried to offload from a mainframe multiple times in the past, unsuccessfully, but have done it successfully with MapR.
John: Talk about some of the success you’ve had with customers. Talk about the success that you’ve had, specifically around where you’re winning.
Jack: There’s a whole class of applications that Hadoop is enabling, which is about operations and analytics. It’s taking this high arrival-rate, machine-generated data, and doing analytics as it happens, and then impacting the business. So whether it’s fraud detection, recommendation engines, or supply chain applications using sensor data, it’s happening very, very quickly. So a system that can tolerate and accept streaming data sources, has real-time operations, that is 24/7 and highly available, is what really moves the needle. And those features are in the examples that I used with the Rubicon Project, cable TV, telco, etc.
John: What are the primary outcomes that your customers want with your product? Is there an outcome that’s consistent among all of your wins?
Jack: When looking at the big picture, some of them are focused on how to optimize revenue, some are focused on reducing costs, and some are focused on risk mitigation. If there’s anything that they have in common, it’s the fact that as they moved from test and looked at production, they want to ensure that the key capabilities that they have in enterprise systems today are also in Hadoop. These are capabilities such as SLAs, data protection policies, disaster recovery procedures. They expect the same level of capabilities in Hadoop that they have today in those other systems.