Open Source Software Blog Posts

Posted on February 10, 2017 by Ronald van Loon

Businesses today need to do more than merely acknowledge big data. They need to embrace data and analytics and make them an integral part of their company. Of course, this will require building a quality team of data scientists to handle the data and analytics for the company.

Posted on December 6, 2016 by George Demarest

This blog post is the second in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer (Read Part 1). In his book, Davi outlines the rationale and key challenges of securing big data systems and applications, and he’s included some terrific anecdotes to make the entire book a quick and insightful read.

Posted on December 5, 2016 by Ronald van Loon

Business owners and executives today know the power of social media, mobile technology, cloud computing, and analytics. If you pay attention, however, you will notice that truly mature and successful digital businesses do not jump at every new technological tool or platform.

Posted on November 21, 2016 by Sean O’Dowd

The last decade has ushered in a perfect storm of disruption for the financial services sector – arguably the most data-intensive sector of the global economy. As a result, companies in this sector are caught in a vice.

Posted on November 18, 2016 by Ronald van Loon

There is no denying it – we live in The Age of the Customer. Consumers all over the world are now digitally empowered, and they have the means to decide which businesses will succeed and grow, and which ones will fail. As a result, most savvy businesses now understand that they must be customer-obsessed to succeed.

Posted on November 15, 2016 by Ronald van Loon

The field of data science is one of the youngest and most exciting fields in the technology sector. In no other industry or field can you combine statistics, data analysis, research, and marketing to do jobs that help businesses make the digital transformation and come to full digital maturity.

Posted on November 9, 2016 by Jack Norris

As I discussed in my presentation at the Gartner Symposium/ITxpo in Florida, digital transformation is a key topic for business leaders today. While the impact of digital transformation is easily understood what is less clear are the steps to effectively pursue a digital transformation -- and the three keys to ensure successful digital transformation.

Posted on October 26, 2016 by Sameer Nori

Siri, Alexa, Cortana and Google Now are just the beginning. When it comes to getting things done, machines are increasingly edging humans out of the equation. Big data and analytics are at the core of what some people are calling the “bot revolution.”

Posted on October 25, 2016 by Sameer Nori

Business intelligence (BI), which is one of the oldest concepts in data processing, is undergoing a radical reinvention. The concept has already evolved considerably since it first gained popularity in the early 1990s (and particularly since its first mention in the Cyclopædia of Commercial and Business Anecdotes in 1865!).

Posted on September 26, 2016 by Tugdual Grall

Get an introduction to streaming analytics, which allows you real-time insight from captured events and big data. There are applications across industries, from finance to wine making, though there are two primary challenges to be addressed.

Posted on September 15, 2016 by George Demarest

This blog post is the first in a series based on the ebook The Six Elements of Securing Big Data by security expert and thought leader Davi Ottenheimer. In his book, Davi outlines the rationale and key challenges of securing big data systems and applications. He does so using some great anecdotes and with good humor, making the book a good read whether you’re a white/grey/black hat, cyber superhero, or even if you’re not a security expert at all.

Posted on May 23, 2016 by Charu Madan

In today’s world of immense competition and customer churn, Telecom Providers are reinventing and transforming to be able to provide their customers with the best possible customer care and satisfaction.

Posted on May 2, 2016 by Jim Scott

In some circles today there is a sort of ‘Hadoop vs. RDBMS’ debate ongoing. Often the discussion casts Hadoop as the obvious heir apparent in the data processing world, with RDBMS cast as your father’s Oldsmobile.

Posted on March 30, 2016 by David Cross

For almost seven years, MapR has been committed to advancing the understanding and application of open-source technology to solve big data challenges. Last year we delivered on the promise of Hadoop with the industry's only enterprise-grade, Converged Data Platform that supports a broad set of mission-critical and real-time production uses.

Posted on February 17, 2016 by Neeraja Rentachintala

Today we at MapR would like to congratulate Apache Arrow, a cross system data layer to speed up big data analytics and a brand new addition to the Apache Open Source Software community on its announcement as a Top Level project.

Posted on February 2, 2016 by Will Ochandarena

Two blogs came out recently that share some very interesting perspectives on the blurring lines between architectures and implementation of different data services, ranging from file systems to databases to publish/subscribe streaming services.

Posted on January 14, 2016 by Michele Nemschoff

As we look back at 2015, the most popular blogs on our site are a good reflection of the 2015 trends and developments in the big data space.

Posted on January 13, 2016 by Balaji Mohanam

In this week's whiteboard walkthrough, Balaji Mohanam, Product Manager at MapR, explains the difference between Apache Spark and Apache Flink and how to make a decision which to use.

Posted on January 5, 2016 by Ellen Friedman

It’s the start of a new year -- we’re on the threshold of something new -- so let’s look forward to what you’re likely to be doing in 2016.

Posted on December 15, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, discusses a business use case that leverages the power of MapR Streams.

Posted on December 11, 2015 by Will Ochandarena

In this blog post, I’ll share how we see Myriad delivering value to customers, and how it fits in with the MapR platform.

Posted on December 7, 2015 by Jim Scott

If you’re thinking about working with big data, you might be wondering which tools you should use. If you are trying to enable SQL-on-Hadoop then you might be considering the use of Apache Spark or Apache Drill.

Posted on December 4, 2015 by Jim Scott

Google has set the standard for most of the world when it comes to running systems at scale. It has created a number of different technologies to benefit its business.

Posted on November 13, 2015 by Amol Kekre

Apache Apex is industry’s first ever YARN native engine that fulfills the disruptive promise of big data. In this post we go into more detail about what Apex is, and why it matters.

Posted on June 22, 2015 by Michele Nemschoff

At the Strata + Hadoop World 2015 conference held in San Jose, Ted Dunning, Chief Application Architect for MapR, gave an exciting talk titled “YARN vs. MESOS: Can’t We All Just Get Along?” where he showcased how YARN and MESOS can work together to seamlessly share datacenter resources.

Posted on June 2, 2015 by Dale Kim

As you probably know, Apache Hadoop was inspired by Google’s MapReduce and Google File System papers and cultivated at Yahoo! It started as a large-scale distributed batch processing infrastructure, and was designed to meet the need for an affordable, scalable and flexible data structure that could be used for working with very large data sets.

Posted on April 2, 2015 by Kim Whitehall

The Global Data Competition 2015: Collaborate to Change Climate Change is an initiative that appeals to all walks of life through its “swarm offensive” approach to the global challenge of climate change. The “swarm offensive” approach, coined by the filmmakers of “The Coalition of The Willing” (released in 2010), refers to harnessing technologies, innovations and adaptation strategies from the collective genius of the world through open source infrastructures, thus promoting bottom-up “grassroots” efforts to tackle the climate change challenge as opposed to top-down “establishment” conventional approaches.

Posted on March 9, 2015 by Anu Yamunan

Today, we are announcing the availability of a course on HBase, the in-Hadoop NoSQL database. The course is titled “HBase Data Model and Architecture” and is catered to data analysts, data architects and application developers.

Posted on December 31, 2014 by Karen Whipple

As we close out the year, here is a look back at our 10 most popular blogs of 2014.  Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.  

Posted on December 12, 2014 by Nitin Bandugula

So, we did it again! Another rapidly growing open source project is now formally supported and packaged in the MapR Distribution including Apache Hadoop. This time the project is Apache Storm. I must say, the Storm project is special, given that we were the first ones to champion this project two years ago. Our own Ted Dunning has mentored the Storm community to get it to Apache Top Level Project status recently. Furthermore, Storm is associated with real-time processing—one of the core strengths of the MapR platform—with features such as a random read-write file-system and the option to use NFS-based spout. Not surprisingly, we already have customers using Storm on MapR in production.

Posted on October 12, 2014 by Anoop Dawar

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #4: MapR provides true multi-tenancy with job isolation, volumes, quotas, data and job placement control, including for YARN.

Posted on October 9, 2014 by Dale Kim

In this blog series, we’re showcasing the top 10 reasons customers are turning to MapR in order to create new insights and optimize their data-driven strategies. Here’s reason #7: MapR provides the top-ranked NoSQL key-value database for current offering.

Posted on October 8, 2014 by Nitin Bandugula

In this blog series, we’re showcasing the top 10 reasons customers choose the MapR Distribution for Hadoop to optimize their data-driven strategies. Reason #8: MapR provides Unbiased Open Source supporting 20+ OSS projects.

Posted on September 2, 2014 by Jim Scott

“Google is living a few years in the future and sends the rest of us messages,” Doug Cutting, Hadoop founder

Posted on August 12, 2014 by Pat Farrel
There are big changes happening in Apache Mahout. For years it’s been the go to machine learning library for Hadoop. It contained most of the best-in-class algorithms for scalable machine learning, which means clustering, classification, and recommendation. But it was written for Hadoop and mapreduce. Today a number of new parallel execution engines show great promise in speeding calculations by as much as 10-100x (Spark, H2O, Flink). That means instead of buying 10 computers for a cluster, a single one may do. That should get you manager’s attention.
Posted on June 4, 2014 by Michele Nemschoff

The first day of the 2014 Hadoop Summit was filled with announcements and interviews.  MapR announced our first Apache Hadoop App Gallery, as well as our exciting partnership with Syncsort.  Jack Norris, MapR CMO, had a chance to talk about this news on theCUBE with Wikibon’s Jeffrey Kelly and SiliconANGLE’s John Furrier.

Posted on April 22, 2014 by Kirk Borne

A while back, I presented a Big Data Glossary: A to ZZ. In separate articles, I discussed some of the different entries in the glossary. Here, I focus on H (Hadoop), which is the evolving but increasingly standardized big data computing platform.

Posted on January 23, 2014 by Anoop Dawar
As part of our ongoing certification process, MapR has updated the following ecosystem projects: HBase, Oozie, HTTPFS, Flume, Hive and Sqoop. These ecosystem projects are certified with Release 3.1.0. They are available on:

For details on these updates, please refer to the related release notes:

Posted on September 24, 2013 by Sridhar Reddy
MapR M7 provides ease of use, dependability and performance advantages for NoSQL and Apache Hadoop™ applications. Apache HBase is a key-value based NoSQL database solution that is built on top of Hadoop. MapR M7’s architecture is specifically designed to optimize the storage and processing of files as well as tables within a unified platform.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free