Apache Hadoop Blog Posts

Posted on December 2, 2016 by Ellen Friedman

Considering big data techniques, Hadoop-based approaches were among the first to be widely recognized and widely used, but Hadoop is just a part of modern big data solutions. Evolving technologies offer a wide range of capabilities that include distributed file storage, NoSQL databases, data stream transport and stream processing, search, SQL-on-big-data, machine learning, and more.

Posted on November 21, 2016 by Sean O’Dowd

The last decade has ushered in a perfect storm of disruption for the financial services sector – arguably the most data-intensive sector of the global economy. As a result, companies in this sector are caught in a vice.

Posted on October 27, 2016 by James Sun

MapR has worked closely with Azure to develop sandboxes that enable users to do a proof of concept with the MapR Converged Data Platform. These sandboxes, which are pre-loaded and preconfigured with the MapR software and the required supporting operating system, can be launched on the Azure Marketplace portal.

Posted on October 6, 2016 by Jack Norris

This week I attended my sixth Strata + Hadoop World. Actually, in the beginning they were two separate shows, but the evolution since has been more than the combining, or convergence if you will, of the two shows. The first shows were attended almost exclusively by technologists looking to learn and understand new big data technologies, especially Hadoop.

Posted on October 3, 2016 by Kirk Borne

One of the most significant characteristics of the evolving digital age is the convergence of technologies. That includes information management (structured and unstructured databases: e.g., NoSQL), data collection (big data), data storage (cloud and distributed data: e.g., Hadoop), data applications (analytics), knowledge discovery (data science), algorithms (machine learning), transparency (open data), computation (distributed data processing: e.g., MapReduce and Spark), sensors (Internet of Things: IoT), and API services (microservices, containerization).

Posted on September 26, 2016 by Tugdual Grall

Get an introduction to streaming analytics, which allows you real-time insight from captured events and big data. There are applications across industries, from finance to wine making, though there are two primary challenges to be addressed.

Posted on September 15, 2016 by George Demarest

This blog post is the first in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer. In his book, Davi outlines the rationale and key challenges of securing big data systems and applications. He does so using some great anecdotes and with good humor, making the book a good read whether you’re a white/grey/black hat, cyber superhero, or even if you’re not a security expert at all.

Posted on August 16, 2016 by Ellen Friedman

It’s not just a concern when ordering coffee. Something similar can happen as we investigate new and innovative big data technologies and techniques. I used the cappuccino example in a talk I presented recently at the Strata + Hadoop World Conference in London. The talk, titled “Building Better Cross Team Communication,” highlighted the importance of identifying and addressing the difference in how each side thinks the world works when two groups that have different experience and skills come together.

Posted on August 10, 2016 by Dale Kim

Dale Kim, Sr. Director of Industry Solutions at MapR, describes the monitoring capabilities of the MapR Converged Data Platform, which easily give you a single view of all cluster operations. Leveraging popular open source technologies, the monitoring system is customizable and extensible to address the challenges of your big data deployment requirements.

Posted on August 9, 2016 by Yvonne Chen

With the increasing amount of information that we use daily, technology is only becoming more and more important in everything we do. And businesses are seeing this at much greater scale than we do as consumers. There are many great examples of this in just about every industry.

Posted on August 1, 2016 by Sameer Nori

Apache Spark is becoming very popular and widely used in the big data community. There are several reasons for Spark getting such rapid traction. These include its in-memory processing capabilities, support for a wide range of engines for various use cases such as streaming, machine learning, and SQL, and the ability to develop in multiple languages such as Python and Scala.

Posted on July 26, 2016 by Manny Puentes

“Big Data” is no longer a buzzword. Businesses big and small that don’t invest now in big data technologies risk getting left behind as the marketplace becomes more and more data-driven. In fact, a recent McKinsey and Company report suggested that companies that invest in big data and analytics consistently outperform their peers in both productivity and revenue.

Posted on July 25, 2016 by Jim Scott

Within this post you will see mention of message-driven architectures. This is in short a subset of a service oriented architecture (SOA). This has been around for many years and is a very popular model. What you will find going through this post is that the foundational message-driven architecture is more competitive to the concepts of the enterprise service bus (ESB).

Posted on July 19, 2016 by Ellen Friedman

In January, I made predictions about six big data trends for 2016 (“What Will You Do in 2016?”). Now we’ve reached the mid-and-a-bit-more year, so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.

Posted on July 8, 2016 by Ankur Desai

I was at the annual Hadoop Summit in San Jose last week. As usual, the MapR booth was buzzing with big data enthusiasts and experts alike. We showcased demos that spanned multiple topics including multi-cluster Hadoop monitoring using Grafana and Kibana (as part of our new Spyglass Initiative), IoT stream analysis using MapR Streams and Spark Streaming, and self-service big data analytics using Apache Drill.

Posted on June 30, 2016 by Prashant Rathi

Today we are proud to announce the Spyglass Initiative focused on easy management, deep visibility and full control. With this first release, MapR Monitoring empowers administrators with cluster monitoring capabilities, including metric and log collection from nodes, services and jobs, and dashboards.

Posted on June 20, 2016 by Dale Kim

Is there a case to be made for big data for security analytics? The answer is an unqualified “yes.” In fact CSO Magazine called cyber security “the killer app” for big data analytics.

Posted on June 14, 2016 by Kirk Borne

In the beginning was data. How do we know this? Because many (if not all) creation stories from all cultures were essentially developed as an explanation of the world as observed by humans.

Posted on June 7, 2016 by Carol McDonald

Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.

Posted on June 6, 2016 by Balaji Mohanam

Apache Spark, a powerful general purpose engine for processing large amounts of data, has seen a rapid increase in its adoption since its release. Recognizing its impact very early on, MapR has supported and invested in Spark as part of our Hadoop distribution to enable enterprises to build applications with Spark and deploy it in production in a reliable manner.

Posted on May 18, 2016 by Jim Scott

Perhaps you’re old enough to remember when the library was the place we went to learn. We foraged through card catalogs, encyclopedias and the Reader's Guide to Periodical Literature in hopes that we’d be able to understand what was going on in other people’s minds when they decided what went where.

Posted on May 9, 2016 by Jim Scott

With all the talk about Big Data, most organizations are barely out of the starting blocks when it comes to exploiting it for business benefit. Gartner estimates that 85% of Fortune 500 companies are yet unable to exploit Big Data for competitive advantage.

Posted on May 2, 2016 by Jim Scott

In some circles today there is a sort of ‘Hadoop vs. RDBMS’ debate ongoing. Often the discussion casts Hadoop as the obvious heir apparent in the data processing world, with RDBMS cast as your father’s Oldsmobile.

Posted on April 25, 2016 by Ellen Friedman

Organizations embracing big data are ready to put data to work, including looking for ways to effectively analyze data from a variety of sources in real time or near real time.

Posted on April 7, 2016 by Jack Norris

There are substantial advantages to being able to make decisions at the speed required to respond to events in the moment. In fact, real time is at the foundation of many transformational applications. Let’s take a closer look at what real time really means, and why real time is required across the entire process.

Posted on March 29, 2016 by John Schroeder

What’s clear to me is that we are in the midst of the biggest change in enterprise computing in decades: a shift in how data is stored, analyzed and processed is changing the way businesses operate and compete in the marketplace.

Posted on March 21, 2016 by Steve Wooledge

The number of organizations that are thinking about using Hadoop has grown astronomically over the past year. How do you know whether you’re ready to implement Hadoop, and what are the best practices?

Posted on March 15, 2016 by Steve Wooledge

In the world of data warehouses and data marts, OLAP analysis has existed for many years. Concepts like drill down, drill across and roll ups have allowed business analysts and users to easily access and analyze data across a variety of dimensions such as product, customers and regions.

Posted on March 7, 2016 by Michele Nemschoff

We are excited to share with you that Gartner has named MapR a Visionary in the Gartner 2016 Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Gartner evaluated 21 software vendors on 15 criteria for the quadrant.

Posted on March 4, 2016 by Carl Olofson

Hadoop is a key data technology for Big Data, as everyone knows. But the question becomes, how can Big Data help make me more competitive, more efficient, and better able to detect fraud, security breaches, and other abuses?

Posted on January 14, 2016 by Michele Nemschoff

As we look back at 2015, the most popular blogs on our site are a good reflection of the 2015 trends and developments in the big data space.

Posted on January 11, 2016 by Kirk Borne

Are people in your data analytics organization contemplating the impending data avalanche from the internet of things and thus asking this question: “Spark or Hadoop?” That’s the wrong question!

Posted on January 5, 2016 by Ellen Friedman

It’s the start of a new year -- we’re on the threshold of something new -- so let’s look forward to what you’re likely to be doing in 2016.

Posted on January 4, 2016 by Ellen Friedman

Banks are among the many businesses taking advantage of big data and IoT opportunities, including for mobile payments, online banking, and smart kiosks, but the huge quantities of personally sensitive data from these activities must be protected at all stages.

Posted on November 30, 2015 by Jim Scott

The faster questions can be asked the faster you can get answers. Waiting for data to be shipped off of servers to a central processing platform can take time and most businesses these days want to get as close to real time as possible.

Posted on November 24, 2015 by Jim Scott

Streaming data enables businesses to respond to customers as close to real time as possible. There are many different ways to leverage a streaming platform and utilizing Spark in your streaming architecture is easier than you might think.

Posted on November 6, 2015 by Jim Scott

As technology advances at breakneck speed, our lives are becoming increasingly digitized. From Twitter feeds to sensor data to medical devices, companies are drowning in big data yet starving for actionable information.

Posted on October 30, 2015 by Ellen Friedman

There’s good news in the world of NoSQL databases that will put a smile on the face of developers – and that should also make business leaders happy because it means shorter time-to-value. You can now enjoy the ease and flexibility of a document-style database with the power of extreme scalability and performance.

Posted on October 22, 2015 by Ellen Friedman

Walmart is an industry leader in global e-commerce and brick-and-mortar retail, and they’re also a leader in the use of Hadoop-based technologies to implement their new data-driven approach to business.

Posted on October 13, 2015 by Nitin Bandugula

The recent Attunity and MapR webinar ”Give your Enterprise a Spark: How to Deploy Hadoop with Spark in Production” proved to be highly interactive and engaging. As promised, Nitin and Rodan have provided follow-up answers your questions.

Posted on October 12, 2015 by Jack Norris

The cost of waste, fraud and abuse in the healthcare industry is a key contributor to spiraling health care costs in the United States. In 2012, healthcare waste and abuse accounted for nearly $60 billion.

Posted on September 23, 2015 by Jim Scott

In this blog post, I’ll talk about the relationship between Spark and Hadoop, what Hadoop gives Spark, and what Spark gives Hadoop.

Posted on September 22, 2015 by Michele Nemschoff

Australian shoppers are some of the most digitally influenced in the world; a majority of Australians go online to research a product before buying it, according to a 2015 report by Deloitte.

Posted on September 21, 2015 by Jim Scott

Recently, a new name has entered many of the conversations about big data. Some people see the popular newcomer Apache Spark™ as a more accessible and more powerful replacement for Hadoop, the original technology of choice for big data.

Posted on September 15, 2015 by Michele Nemschoff

It's an exciting time for those in pharmaceutical research these days, given that research organizations can now leverage big data to improve their business.

Posted on September 8, 2015 by Michele Nemschoff

The explosion of data from new devices and technologies has forced the telecommunications industry to completely change the way they handle big data. Their traditional storage and analytics solutions cannot adequately manage the expanding, diverse volume of data generated today.

Posted on August 21, 2015 by Jim Scott

Apache Hadoop is revolutionizing big data in more than one way. While the Hadoop platform introduced reliable distributed storage and processing, various packages such as Spark on top of Hadoop make it possible to build applications and analyze data much faster. Here are some cool ways the Hadoop stack is being used right now.

Posted on August 19, 2015 by Mark Muncy

Organizations have struggled with critical performance and scalability shortcomings of conventional data integration for years, leading many to push heavy data integration workloads down to the data warehouse. As a result, core data integration experienced a shift from extract, transform, and load (ETL) to extract, load, and transform (ELT).

Posted on August 4, 2015 by Dale Kim

Every business has some kind of database. It’s right up there with word processors and spreadsheets as essential business software. Relational databases are one of the most popular types—but as useful as they are, they’re not necessarily the best for every situation. NoSQL databases are becoming popular because they can handle different types of data at scale more efficiently.

Posted on June 18, 2015 by Dale Kim

Hadoop has been a phenomenon for big data and operational workloads. It has transformed from its batch-oriented roots into an interactive platform by incorporating a number of components, including technologies that provide SQL and distributed in-memory capabilities.

Posted on June 11, 2015 by Sameer Nori

Reducing operating costs and increasing efficiency are, and will always be, priorities for any business, but they become imperative when an industry is facing cyclical challenges. Given the current volatility in the oil market, the oil and gas industry is looking for solutions that can proactively address inefficiencies through better asset tracking and predictive maintenance.

Posted on June 10, 2015 by Jabari Norton

Today, a significant number of our customers are deploying the MapR Big Data platform in the cloud. This is not only true for development and test use cases, but also for many more production ones, especially those in the Internet of Everything (IoE) space.

Posted on June 9, 2015 by Will Ochandarena

Lately I’ve talked to lots of people who are just getting their heads wrapped around the value of big data software such as Apache Hadoop, but are getting stuck figuring out the details. What kind of servers do I need to buy? What services do I need to install to make a “data lake”? How do I make sure I install the services in a way that makes them highly available, while being optimized for performance?

Posted on June 2, 2015 by Dale Kim

As you probably know, Apache Hadoop was inspired by Google’s MapReduce and Google File System papers and cultivated at Yahoo! It started as a large-scale distributed batch processing infrastructure, and was designed to meet the need for an affordable, scalable and flexible data structure that could be used for working with very large data sets.

Posted on May 29, 2015 by Matthew Lescohier

The MapR Distribution including Hadoop is now available in a private IT sandbox environment on the Amazon Web Services (AWS) Test Drive. We’ve partnered with AWS to create this lab environment so that you can gain hands-on experience with Hadoop.

Posted on May 15, 2015 by Steve Wooledge

Gartner just released a comprehensive research report based on a survey highlighting the adoption trends around Hadoop, which sheds some light on where and how customers are getting value from Hadoop. Some of the key take-aways in the report include:

Posted on May 15, 2015 by Arsalan Tavakoli-Shiraji

Apache Spark recently celebrated its five-year anniversary as an open source project. While we are always humbled and excited by the open source success of Spark, it gives us far greater pleasure in knowing that there are more and more organizations this year that are deploying Spark into production business applications.

Posted on May 8, 2015 by Jabari Norton

For the past several years, organizations have been struggling to figure out how to deal with all of the new data that is streaming in all around them. From smartphones to production line sensors, everything is generating data.

Posted on April 17, 2015 by Nitin Bandugula

So you’ve researched the general capabilities of Hadoop, and have worked with your colleagues to identify the first set of big data use cases to tackle. Now, you’re ready to take the plunge and select the right Hadoop solution to invest in. After thoroughly doing your homework, you might think the final selection step would be simple, knowing that open source packages appear to be the same across the different Hadoop distributions out there. Unfortunately, that is not the case.

Posted on April 8, 2015 by Christian Prokopp

I recently wrote an article about how to get a Big Data initiative back on track. In the comments section, a user challenged how such an initiative is different to a traditional analytics project. That’s a good question, and the answer is not immediately obvious to most. The key differentiators are not Hadoop, NoSQL, large datasets or any of the usual suspects. The difference is that it is impacting all organizational datasets, the overall flow of data in the long-term, as well as data processing and storage.

Posted on March 27, 2015 by Sameer Nori

This is a tremendously exciting time for those who work in clinical genomics. The demand for cutting-edge technologies that deliver fast and accurate genome information has exploded. In 2013, close to 2000 genome sequencers were in operation. These genome sequencers produced a whopping 15 petabytes of sequence data, which included the sequencing of 300k human genomes.

Posted on March 18, 2015 by Dale Kim

In this week's Whiteboard Walkthrough, Dale Kim, Director of Industry Solutions at MapR, gets you up to speed on Apache Hadoop and NoSQL. He talks about the similarities and differences between the two, but most importantly how both technologies should be a requirement for any true big data environment.

Posted on March 13, 2015 by Nitesh Kumar

Fraud represents the biggest loss for banks, accounting for upward of $1.744 billion in losses annually. The banking industry spends millions each year on technologies aimed at reducing fraud and retaining customers, but the spend does little in protecting banks. Let’s focus on why the current fraud detection approaches don't work as well as they should and how machine learning on big data can help.

Posted on March 12, 2015 by Carole Murphy

Securing data is a problem, not for a subset of people, but in fact for everyone dealing with sensitive data—from executives and business stakeholders to data scientists and developers.

Posted on March 10, 2015 by Steve Wooledge

Companies everywhere are excited about harnessing big data and putting it to work. Adopting a Hadoop distribution is a critical decision that has far-reaching ramifications for your organization. CITO Research recognizes this in its white paper, “Five Questions to Ask Before Choosing a Hadoop Distribution.”

Posted on March 5, 2015 by Jim Scott

To get value out of today’s big and fast data, organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. The Hadoop revolution is about merging business analytics and production operations to create the ‘as-it-happens’ business.

Posted on March 4, 2015 by Ellen Friedman

Big data challenges and opportunities are rapidly spreading across a huge number of organizations, large and small, in a wide range of verticals. Not surprisingly, people are turning to scalable solutions such as Apache Hadoop and NoSQL-based technologies to meet these challenges. The choice of an excellent data platform along with smart selections from among the many Hadoop ecosystem tools are of course important decisions to set yourself up for success, but there are also some other fundamental choices that can make a big difference in meeting and exceeding your goals, regardless of the particular project or tool involved.

Posted on March 3, 2015 by Dale Kim

Gigaom Research released a new report recently, titled “Extending Hadoop Towards the Data Lake.” According to the report, early data lake adopters are integrating Hadoop into organizational workflows, and are addressing challenges regarding the cleanliness, validity, and protection of their data. Their research resulted in some key findings.

Posted on February 23, 2015 by Karen Whipple

Big data made a huge leap forward in 2014 in terms of the increased adoption of big data technologies among enterprises. Where is big data heading in 2015? At the beginning of this year, MapR CEO John Schroeder made several predictions about key aspects of big data and Hadoop, from data agility and processing data platforms to self-service and market consolidation. Check out our Big Data Trends in Action infographic below to learn about the five key trends that will reshape the way enterprises operate internally and connect with consumers.

Posted on February 18, 2015 by Sameer Nori

Today, MapR introduced Quick Start Solutions, a powerful package of services, software and training/certification to help you jump-start your deployments of enterprise data hub, security and marketing applications. These solutions address commonly implemented and high-value Hadoop use cases for Data Warehouse Optimization and Analytics, Security Log Analytics and Recommendation Engines.

Posted on February 17, 2015 by Ellen Friedman

One of the best ways to figure out how to succeed with your own large-scale projects is to see what others are doing – what has worked for them and what has not.

Posted on February 13, 2015 by Michele Nemschoff

Dr. Pramod Varma, Chief Architect and Technology Advisor to Unique Identification Authority of India (UIDAI), gave an informative talk titled “Architecting World's Largest Biometric Identity System - Aadhaar Experience”. He began his talk by talking about why the Aadhaar project was created. In India, the inability to prove one’s identity is one of the biggest barriers that prevents the poor from accessing benefits and subsidies. India is a country with 1.2 billion residents in over 640,000 villages. The Indian government spends $50 billion on direct subsidies (food coupons for rice, cooking gas, etc.) every year. Both public and private agencies in India require proof of identity before providing services or benefits to those living in India.

Posted on February 11, 2015 by Sean Kandel

The following is a guest blog post from Sean Kandel, CTO & Co-founder of MapR Partner, Trifacta. It’s no secret that enterprises are increasingly adopting Hadoop for a variety of analytic purposes. The Hadoop software stack introduces entirely new economics for storing and processing data at scale. It also allows organizations unparalleled flexibility in how they’re able to leverage data of all shapes and sizes to uncover insights about their business.

Posted on February 5, 2015 by Steve Wooledge

MapR announced today that our SQL-on-Hadoop solution earned the highest score for Hadoop/data warehouse interoperability. MapR was among six vendors invited to participate in Gigaom Research’s January 2015 report, “Sector Roadmap: Hadoop/Data Warehouse Interoperability.” One of the key factors for our top placement in this competitive evaluation was the integration powers of Apache Drill’s technology included in the MapR Distribution. This report validates Apache Drill as a major advancement in data exploration given its schema flexibility, which makes it possible for you to immediately query complex data in native formats, such as schema-less data, nested data, and data with rapidly-evolving schemas, with minimal IT involvement.

Posted on February 4, 2015 by Steve Wooledge

Did you know that not all Hadoop distributions are the same? As Hadoop deployments grow, the architectural differences between Hadoop distributions begin to show dramatic cost differences. These differences can save you 20-50% in terms of total cost of ownership, as we detailed in a previous post. To make it easier for you to compare distributions and understand the true costs for deploying and running Hadoop, we’ve developed the for Hadoop, a simple self-service tool that uses your own data to show how the Hadoop distributions costs stack up.

Posted on January 27, 2015 by Anu Yamunan

Today MapR announced the availability of free Hadoop On-Demand Training for developers, analysts and administrators. Hadoop On-Demand Training offers full-length courses on a range of Hadoop technologies for developers, data analysts and administrators. Designed in a format that meets your convenience, availability and flexibility needs, these courses will lead you on the path to becoming a certified Hadoop professional.

Posted on January 23, 2015 by Igor Izotov

There are resources and there are Resources. The old statement is still valid: Your project's success is entirely dependent on the people you hire. Hadoop initiatives are no exception to this; instead, these initiatives are particularly demanding. Here are a few reasons why: 1) Hadoop is an emerging technology, there is and there will be much hype around it; the temptation to try this technology no matter what is coming from specialists and companies, large and small. Emerging is synonymous to evolving when it comes to Hadoop; funny enough, the book on Hadoop that was just released by O'Reilly and Pentaho (freely available to download here) will become outdated in a few months' time. The team members you require must be a particular, fond-of-learning kind to be able to stay on top of the Hadoop evolution, choose which innovations to adopt and, at the end of the day, deliver well.

Posted on January 22, 2015 by Dan Woods

Information wants to be free. Open source is free. Moore’s law is making computing free. Free, Free, Free. Enough already with the free. In the real world, computing costs money. Making great products costs money. More efficient computing saves money. If you’re running a serious big data infrastructure, you must first focus on getting value, but once that’s done, you must make sure you are not bleeding money in a variety of ways. Is your cluster too big? Are you wasting storage? Do you spend too much time on admin? Are you building up technical debt? How much is downtime wasting? If you aren’t asking these questions, you are surely wasting money.

Posted on November 20, 2014 by Anil Gadre

I recently joined MapR to lead the Product Management group. Since I used to consult with MapR four years ago (before the company was launched), it is natural that everyone wants to know, "What's different four years later?" The big answer is that the promise of Hadoop has turned into a reality for a wide array of situations, and the impact is really meaningful. Here are some first impressions after working at MapR for a few weeks. Note: I have to say up front that this is not meant to be a dispassionate research piece; It is just top of mind impressions.

Posted on November 10, 2014 by Kirk Borne

Big data is a universal phenomenon. Every business sector and aspect of society is being touched by the expanding flood of information from sensors, social networks, and streaming data sources. The financial sector is riding this wave as well. We examine here some of the features and benefits of Hadoop (and its family of tools and services) that enable large-scale data processing in finance (and consequently in nearly every other sector).

Posted on November 7, 2014 by Jim Scott

Some people say I am biased toward certain technologies. That is a completely true statement! Granted, it does depend on the specific technology. But just because I may be biased with certain technologies doesn’t mean I’m not objective or fair. When it comes to Hadoop, Apache Hadoop really is free–as in beer. But, in reality, unless you are a huge company with a massive team of engineers, you very likely are NOT going to be patching and building your own distribution of Hadoop to run internally. No matter what anyone says, if you are paying for support for your distribution of Hadoop in any way then it is not truly free.

Posted on October 15, 2014 by Dale Kim

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Previous posts in this series provided insights about nine good reasons why customers choose MapR: from security to performance and TCO to ease of integration. Here’s reason #1: High availability.

Posted on October 14, 2014 by Dale Kim

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #2: MapR provides world record performance for Hadoop.

Posted on October 13, 2014 by Ulf Andreasson

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #3: MapR provides ease of data integration through industry standard interfaces for data access and movement such as read-write NFS, ODBC, REST APIs and LDAP. Read-write NFS access is one of the key reasons customers choose MapR.

Posted on October 11, 2014 by Bruce Penn

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #5: MapR provides complete data protection and disaster recovery with real snapshots and mirroring.

Posted on October 7, 2014 by Jim Scott

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #9: MapR provides a read-write file system for real-time Hadoop.

Posted on October 6, 2014 by Anoop Dawar

To get real with Hadoop, you need a real enterprise-ready platform. Join us as we begin the countdown of the 10 top reasons our customers choose MapR. Reason #10....Enterprise-grade security. Security has always been important with enterprise customers, but today it’s non-negotiable.

Posted on August 28, 2014 by Michele Nemschoff

Our daily commute may not feel like such a high tech experience, but whether you feel it or not – it is. Big data and Hadoop have revolutionized the transportation industry over the past several years. Whether in a car, a train, a plane or a delivery truck, we all use big data throughout our travels. Let’s go through a few specific use cases to spotlight transportation businesses that are using big data in a big way.

Posted on August 20, 2014 by Michele Nemschoff

Five minutes is easily squandered without much thought; however with Hadoop, five minutes can make a big impact.  John Schroeder, MapR CEO and Founder, recently used a five-minute keynote address to illustrate this point.  

Following is an edited transcript of John's message. 

Posted on June 4, 2014 by Karen Whipple

Capping off the first day of 2014 Hadoop Summit, the Open Air, Open Source Celebration was a lively event bringing friends of MapR together to toast the Apache Drill and Apache Spark projects. Rather than words, we will let the photos tell the story.

Conversations and ideas flowed in the fresh air.  Tomer Shiran, VP of Product Development and Ted Dunning, Chief Application Architect enjoyed talking Hadoop in the relaxed outdoor venue.

 

Posted on May 7, 2014 by Jon Posnik

SQL-on-Hadoop just got easier this morning.  Working together with the HP Vertica team, we are excited to announce general availability of the HP Vertica Analytics Platform running on the MapR Distribution for Apache Hadoop.

Posted on April 22, 2014 by Kirk Borne

A while back, I presented a Big Data Glossary: A to ZZ. In separate articles, I discussed some of the different entries in the glossary. Here, I focus on H (Hadoop), which is the evolving but increasingly standardized big data computing platform.

Posted on April 10, 2014 by Arsalan Tavakoli-Shiraji
Today, MapR announced that it will distribute and support the Apache Spark platform as part of the MapR Distribution for Hadoop in partnership with Databricks. We’re thrilled to start on this journey with MapR for a multitude of reasons.
Posted on April 3, 2014 by Kirk Borne

The explosive growth in data and in big data technologies (that process and transform the data into knowledge) corresponds to a new industrial revolution.  The raw materials and the machinery are different from past revolutions, but the fundamental features are not so different – new markets, new opportunities, new tools, and new wealth are being created at a remarkable pace.

Posted on March 7, 2014 by Kirk Borne

I am not sure about other data scientists’ experiences in trying to explain Big Data and Data Science to family members, but I find that there is a lot of interest coupled with a lot of confusion. That’s not too surprising since there are many experts in the field who are also confounded by the mixed messages, the hype, the uncertain meanings of terminology, and where all of this is going.

Posted on March 6, 2014 by Michele Nemschoff

MapR announced at the end of last week that we were among the select companies that Forrester Research Inc. invited to participate in its report entitled “The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014.”  In this evaluation, MapR was cited as a Leader and achieved the highest score for Current Offering among all reviewed vendors.

Posted on March 5, 2014 by Steve Wooledge

Forrester Research principal analyst Mike Gualtieri, along with MapR CMO Jack Norris, joined us for a webinar titled “3 Things You Didn’t Know You Could Do With Hadoop.”

Posted on February 10, 2014 by Ali Hussain

We, at Flux7 Labs, a solutions company, help customers maximize performance/$. To help our customers make the right decisions we constantly research and evaluate the solutions available to customers and thereby build and strengthen our internal knowledge. As part of this research process, we evaluated the most common Hadoop distributions on various metrics. The distributions we tested were from Intel, Cloudera, Hortonworks, and MapR. This testing was done independently on all the distributions.

Posted on December 18, 2013 by Steve Wooledge
While you race around checking off items from your holiday lists, banks are just as busy with their fraud prevention efforts. According to a report by the Association of Certified Fraud Examiners, the typical organization loses 5% of its revenues to fraud each year, which translates to a projected annual fraud loss of over $3.5 trillion. Banks and other financial services companies are particularly vulnerable, due to the massive amount of financial data generated every day.
Posted on December 10, 2013 by Jason Altekruse

This past summer I had an amazing opportunity to work on the up and coming open source project Apache Drill. Working at MapR Technologies alongside a great group of engineers and with an expanding open source community has been an invaluable learning experience. SQL in Hadoop has been a very hot topic over the past year, and while there are a lot of different approaches being pursued, I believe Drill is making one of the strongest cases for providing speed, extensibility and longevity.

Posted on November 15, 2013 by Jack Norris
As I discussed in my keynote presentation at Strata + Hadoop World in New York, there are a lot of myths and misconceptions when it comes to Hadoop. Let’s take a closer look at the architecture and customer use cases that highlight the power of Hadoop and separate the myths from reality.

The Hadoop Space is Incredibly Competitive—a Knock-Down, Drag-Out Fight across the Major Distributions.

Myth or Reality?

Posted on November 14, 2013 by Karen Whipple

Big Data is fueling responsible global citizenship.

Our friends at The Climate Corporation and their approach to managing the economic impact of extreme weather are featured in The New Yorker article “Climate By Numbers”.

Posted on October 1, 2013 by Karen Whipple
MapR will be at a wide range of events this month where we will participate on panels, deliver keynotes and present tutorials.

M.C. Srivas, CTO will participate on the Big Data Ecosystem partner panel at Splunk .conf2013.

Posted on September 13, 2013 by Tomer Shiran
Posted on August 27, 2013 by Karen Whipple
It's always interesting to compare products side-by-side and Gartner Research Director Svetlana Sicular has done just that in her The Charging Elephant blog series. Sicular posed the same set of questions to the participants of the recent Hadoop panel at the Gartner Catalyst conference.  Jack Norris, MapR CMO participated, here's his take on the first question.

Q: How specifically are you addressing variety, not merely volume and velocity?

Posted on August 2, 2013 by Jack Norris
I often get asked, “Who in an organization buys Hadoop?” While there are buyers with many different titles and functions that spearhead the adoption of Hadoop, I’d like to single out one fast growing buyer: the Chief Risk Officer.

Posted on July 24, 2013 by Karen Whipple
The researchers at McKinsey & Company identified five catalysts that could deliver a substantial boost to GDP by 2020. Not surprisingly, “big data analytics as a productivity tool” made the list.

Posted on July 23, 2013 by Karen Whipple
The MapR partner ecosystem continues to expand. Univa, the Data Center Automation company, today announced their partnership with MapR to integrate the Univa Grid Engine with the MapR Enterprise Big Data platform. This partnership allows customers to create a shared infrastructure for Hadoop that fosters rapid enterprise-wide deployment and ensures a big return on your investment.

Posted on July 19, 2013 by Rob Rosen

Back in 2011, Paul Yang from Facebook’s engineering team published a fascinating blog post that detailed how they migrated a 30TB Hadoop cluster from one server base to another.   As one of the world’s largest Hadoop deployments, Facebook has deployed several Hadoop clusters collectively numbering over 5,000 nodes, and has amassed an impressive roster of hundreds of H

Posted on July 8, 2013 by Karen Whipple

Mike Gualtieri, Principal Analyst with Forrester Research joined us for a webinar titled Productionizing Hadoop: Seven Architectural Best Practices. Following the webinar, Mike answered a number of questions from participants including one about how to get started with Big Data.

Posted on June 26, 2013 by Jack Norris
There has been a lot of news on the Hadoop front lately, and MapR took it to a higher level today by letting Hadoop industry leaders do the talking. My favorite quote is “Scaling with MapR is now a simple Wash. Rinse. Repeat,” from Greg Stam, Senior VP of Global Software Engineering, Cision. Here is the full quote:

Posted on June 14, 2013 by Jack Norris
There’s been a lot in the news lately about the NSA and Verizon call detail records… how much data are they talking about?

Posted on May 23, 2013 by Jabari Norton

MapR was proud to participate in the Developer Sandbox at Google I/O, Google’s exclusive annual developer conference held in San Francisco last week. The conference featured speakers from various industries as well as code labs and developer demos.

Posted on May 8, 2013 by Jack Norris
An article I read on ReadWrite Enterprise, “One Hadoop to Rule Them All” offers a take on the exploding Hadoop market. It mentions the entrance of relative newcomers such as EMC, IBM, and Intel which just reinforces the burgeoning enterprise demand for Hadoop and Big Data analytics. These established enterprise vendors know that Hadoop is on everyone’s IT radar.

Posted on May 1, 2013 by Jack Norris

Expanding Big Data solutions to yet another level, MapR made two announcements today: MapR M7, which provides an enterprise-grade NoSQL and Hadoop solution to our customers, is now available; and LucidWorks Search will be distributed with the MapR Big Data Platform for Apache Hadoop, including with the new MapR M7 Edition.

M7 Now Available

Posted on April 26, 2013 by Jack Norris
Demand for both Apache Hadoop and NoSQL is extremely strong and this can mean only one thing: companies are hiring, and there is stiff competition for employees with experience in these technologies. This was brought home last month in an article in Data Informed and the numbers have only gone up since that time. Indeed.com shows growth of 150,000 percent in Hadoop job postings:
Posted on March 29, 2013 by Michele Nemschoff
MapR Technologies has released its Big Data and Hadoop conference lineup for the month of April.  MapR will present sessions about Big Data and Hadoop at leading industry conferences in April, including Cloud Connect Conference, Big Data TechCon, Big Data Innovation Summit & Hadoop Innovation Summit and the Open Source Business Conference (OSBC).

Look for MapR at these events:
Posted on March 28, 2013 by Jack Norris
MapR made two significant announcements today regarding our efforts to support the Hadoop ecosystem and provide an open enterprise-grade platform to Big Data users.

Posted on March 6, 2013 by Brad Anderson

Hadoop users were excited to see the real-time Hadoop analytics demonstration at the Strata Conference in Santa Clara.  By streaming the #strataconf twitter hashtag directly into a cluster during the conference, MapR displayed two real-time tag clouds showing a word bubble with the most frequently used words in conference tweets and a user name cloud of top tweeters.  Watching the information change proved mesmerizing for some.

How did we do this?   By bringing MapR and Storm together to capitalize on their strengths.

Posted on February 26, 2013 by M.C. Srivas

Gone in 60 seconds! Breaking the MinuteSort Record

Yuliya Feldman, Amit Hadke, Gera Shegalov, M. C. Srivas

Introduction

The MinuteSort test measures how much data a system can sort in 1 minute. The test requires that a random sequence of 100-byte records, each consisting of a 10-byte key and 90 bytes of payload, be arranged in either ascending or descending order. The earlier record had sorted 14 billion records totaling 1400 gigabytes in 60 seconds. The web site sortbenchmark.org keeps track of all such records.

Posted on January 23, 2013 by Ellen Friedman
This week in Tokyo at the Hadoop Conference Japan 2013 Winter, Ted Dunning presented “The Power of Hadoop to Transform Business.” He talked about what he called “the new future of Hadoop” and how the future that is emerging isn't the future we thought – it’s better.

Posted on November 23, 2012 by Jack Norris
In case you missed it, Forbes recently ran an interesting piece about Hadoop titled Can Hadoop Survive its Weird Beginning? It is worth a read for anyone in the Big Data/Hadoop space.

In it, author Dan Woods makes a couple of interesting observations:
    Posted on October 24, 2012 by M.C. Srivas
    When MapR was given access to Google Compute Engine's limited preview, we ran a lot of mini tests on the virtualized hardware to figure out how to tune our software. The results of those tests were surprisingly good. The speed and consistent performance provided by Google Compute Engine even at the 99th percentile was very impressive. So we decided to run a TeraSort benchmark just to see how well we could do in a virtualized, shared environment where we didn't control the hardware, nor control the other non-MapR tenants.

    An Earlier Result
    Posted on August 13, 2012 by Jack Norris

    In a new briefing, the 451 Group details MapR’s exciting past few months. MapR has continued to strive to expand our product to meet our customers needs in wherever computing environment they may have – in the cloud, on-premise or both. The Amazon and Google announcements serve to establish MapR as the emerging defacto standard for Hadoop.

    Posted on August 7, 2012 by Rob Rosen

    Recently, Ryan Rawson over at Drawn to Scale wrote on their decision to base their innovative product Spire on MapR’s distribution for Hadoop. I was excited to read Ryan’s posting for a couple of reasons:

    Posted on April 23, 2012 by Peter Conrad
    I thought for sure we'd have flying cars and jet packs by the 21st Century. It turns out that the new tools for data analysis are far more important.

    Posted on April 11, 2012 by Jack Norris
    What does it mean to be “Lights Out Data Center Ready”? It means that any failures whether hardware, software or user errors do not require immediate administrator action. On a scheduled basis administrators can visit the data center and perform maintenance that is now routine, not an emergency. Picture an administrator with a shopping cart full of disk drives casually moving through the aisle.

    Posted on February 22, 2012 by Peter Conrad
    In the fall of last year, we launched the MapR Academy, the first free online training resource for Hadoop and MapReduce. Hadoop itself is so new that there is still a huge knowledge gap. Sure, there are a few experts out there, but most people still don't know what Hadoop is, or why they should be interested in the first place.

    Posted on February 7, 2012 by Jack Norris
    The Hadoop market is a fast growing, expanding and exciting ecosystem, but this can also be accompanied by confusion. I thought I’d take a stab at addressing some of the Big Misconceptions about Big Data.  I. First of all, the term Big Data is approaching Cloud in its utter lack of descriptiveness. That said, Big Data is not simply about massive amounts of data – petabytes and beyond. Big Data represents a paradigm shift. It’s about new, unstructured, data sources. It’s about avoiding schema definitions and transformations. There’s no need to structure data before you can derive benefits.
    Posted on January 17, 2012 by Jack Norris
    Our CEO, John Schroeder was recently interviewed in the press and asked about his predictions for Hadoop in 2012. Simply put, he sees a Big year for Big Data. It’s not just the scale of data growth. John shared his view that the ability to process and analyze Big Data is changing the game for companies and it’s changing the game in every aspect of their business.

    Posted on December 6, 2011 by Tomer Shiran
    Today we announced version 1.2 of the MapR Distribution for Apache Hadoop. With this release, MapR continues to push the envelope by making Hadoop more accessible to more users, more languages, and more platforms. This release includes numerous features and capabilities including:
      Posted on November 19, 2011 by Jack Norris
      At Hadoop World last week we announced the MapR Academy. This is our free training resource with videos and documents to help administrators, developers and business users get the information they need to be effective and get the most out of their Big Data. We had several MapR Virtual Trainers attend the conference with Ipads of our training videos and access to the website to show the full complement of our training resources. The response was fantastic. Our Virtual Trainers had a great time interacting with other attendees.
      Posted on November 6, 2011 by M.C. Srivas
      Recently a world record was claimed for a Hadoop benchmark.  MapR has run numerous benchmarks where MapR performs 2 to 5 times faster than other distributions and have published these results. So a world record was quite a claim. We were surprised to see that this world record was for a TeraSort benchmark on a 100GB of data.

      Posted on October 5, 2011 by Jack Norris
      Oracle announced an Oracle Big Data Appliance (BDA) including Hadoop at Oracle OpenWorld this week. Oracle is packaging the Apache code on the BDA appliance. The announcement didn’t include any important 3rd party partnerships or any important innovations for Hadoop.

      Posted on July 18, 2011 by Jack Norris
      A lot of solutions talk about how they handle Big Data, but Hadoop is unique. Hadoop does things that aren’t possible with other solutions -- or if they are possible, they are prohibitively expensive or complex. Hadoop is not an open source alternative to something that’s been on the market for years. That’s why there’s such a premium on key infrastructure innovations like those from MapR.

      Posted on June 1, 2011 by Ted Dunning
      We just announced our signing of the corporate contributor license agreement.  This move is part of our exiting from stealth status and means that we will be able to contribute more openly to the Apache, other open source communities and the general ecosystem that is forming with Apache Hadoop related products.

      Posted on May 25, 2011 by John Schroeder
      Today is a big day for MapR Technologies. We are coming out of stealth mode to talk about the innovations that we’ve developed for Apache Hadoop. These innovations build upon the great work already completed by the Apache Hadoop community. The community is also being enriched by a growing number of companies that are building out the complete technology stack and services to further Hadoop market adoption.

      Blog Sign Up

      Sign up and get the top posts from each week delivered to your inbox every Friday!


      Streaming Data Architecture:

      New Designs Using Apache Kafka and MapR Streams

       

       

       

      Download for free