This morning we were extremely pleased to announce a $110 million financing round led by Google Capital.
WWGD – What Would Google Do? – is an acronym that has gained in popularity over the past 15 years, particularly with respect to the direction for big data. In 1997, there were many companies competing for Internet search supremacy. Do you recall doing an Alta Vista search? Google pioneered a new approach to big data and subsequently ran the table on the search market. Google took the lead but didn’t stop the innovation. For big data challenges, Google processed unstructured data at unprecedented scale, but then developed BigTable, which provided structure with agility. The structure was in the form of tables, but without requiring the centralization that bottlenecked gaining access to new or changing data sources. Over the years, Google has faced many technology scaling challenges and answered the call with innovations for big data, as well as big data center compute, big networking and big storage.
The industry has traditionally seen innovations driven by national defense, aerospace and financial services. During the past decade, the trend has changed and the innovation and the bellwether for the next generation of technology has increasingly come from Web 2.0 companies, especially from Google, which has a stellar technology reputation and track record. Today, when faced with a technology challenge, the question to ask is, “WWGD”?
Jeffrey Dean and Sanjay Ghemawat, Google senior fellow and research engineers, co-authored the paper “MapReduce: Simpliﬁed Data Processing on Large Clusters” in 2004, chronicling the core MapReduce algorithm. Technologies starting with the Google File System had been built to provide the scalable platform to support algorithms like MapReduce. In 2005, the Hadoop project, inspired by the paper and work done at Google, was started.
The innovations continued at Google with the development of new technologies like BigTable, described in “Bigtable: A Distributed Storage System for Structured Data”, a white paper authored by nine Google engineers including Dean and Ghemawat. Dremel provides Google with interactive SQL, described in “Dremel: Interactive Analysis of Web-Scale Datasets”. Spanner distributes database processing: “Spanner: Google's Globally-Distributed Database”. Another innovation is Trutime, software that allows this global technology to know what time it is. Cloud Dataflow, announced at Google I/O last week, is a framework that ingests, transforms, and analyzes large volumes of data.
The Hadoop community has successfully adopted many of these innovations to create enterprise software products. For example, our MapR engineering team draws from these Google innovations when building Apache Drill for interactive database processing, the MapR file system for unstructured data processing, and last year, the MapR in-Hadoop database as a distributed storage system for structured data. Hadoop basically consists of a compute layer and a storage subsystem layer. The compute layer includes the application APIs. MapR provides all of the innovations that implement or introduce application APIs as Apache Open Source projects to ensure these APIs will be freely and easily available for all Hadoop distributions. For example, Apache Drill is extending the popular SQL interface to include self-describing data interchange formats like JSON.
Our MapR engineering team also draws on experience at companies like Oracle, VMware, IBM and Informatica to build in the enterprise capabilities and strive towards five 9’s and lights-out datacenter attributes required by companies that want to make their innovation investments on applications specific to driving their business requirements, and do not have the desire to build or maintain the underlying platform infrastructure.
MapR is proud to be selected by Google Capital. As you consider your direction in big data, ask yourself, “WWGD”?