NoSQL Blog Posts

Posted on October 31, 2016 by Ian Downard

One of the challenges when working with streams is the transitory nature of their data. Many applications require data to be persisted far beyond the point at which said data has any practical value to streaming analytics.

Posted on May 27, 2016 by Jimit Shah

The ability to store and retrieve JSON documents using the OJAI standard has introduced a very powerful way to work with data in your MapR cluster.

Posted on April 22, 2016 by Carol McDonald

This post will show how to integrate Apache Spark Streaming, MapR-DB, and MapR Streams for fast, event-driven applications.

Posted on March 8, 2016 by Anoop Dawar

In 2015, MapR shipped three significant core releases : 4.0.2 in January, 4.1 in April, 5.0 and the GA version of Apache Drill in July. While all this was happening, many of my colleagues in engineering (who’ve demonstrated a whole new level of ingenuity and multitasking) were also working on one of the biggest releases in the history of MapR—the converged data platform release (AKA, MapR 5.1).

Posted on January 25, 2016 by Ranjit Lingaiah

In the wide column data model of MapR-DB, all rows are stored by a row key, column family, column qualifier, value, and timestamps. In the current version, the row key is the only field that is indexed, which fits the common pattern of queries based on the row key.

Posted on November 10, 2015 by Nick Amato

This blog describes how to get an instance of the MapR-DB Document Database Developer Preview image running on Amazon AWS using one of the pre-configured AMI images supplied by MapR. With this AMI, you can start writing JSON-based applications on MapR-DB using the open source Open JSON Application Interface, or OJAI.

Posted on October 23, 2015 by Aditya Kishore

In this week's Whiteboard Walkthrough, Aditya Kishore, engineer on the MapR-DB team, explains how to use the OJAI API to insert, search, and update the document database.

Posted on September 29, 2015 by Bharat Baddepudi

MapR developed OJAI (the Open JSON Application Interface) which provides native integration of JSON-like document processing in Hadoop-style scale-out clusters.

Posted on September 29, 2015 by M.C. Srivas

In this week's Whiteboard Walkthrough, MC Srivas, MapR CTO and Co-Founder, explains the innovation and vision behind MapR-DB and how project Kudu stacks up to the MapR Data Platform.

Posted on September 25, 2015 by Anurag Choudhary

In this week's Whiteboard Walkthrough, Anurag Choudhary, Engineer on the MapR-DB team, explains how horizontal scaling in MapR-DB works and how hot spotting is automatically avoided.

Posted on September 4, 2015 by Carol McDonald

This post will help you get started using Apache Spark Streaming with HBase on the MapR Sandbox. Spark Streaming is an extension of the core Spark API that enables continuous data stream processing.

Posted on August 7, 2015 by Carol McDonald

In this blog post, I’ll give you an in-depth look at the HBase architecture and its main benefits over NoSQL data store solutions. Be sure and read the first blog post in this series, titled “HBase and MapR-DB: Designed for Distribution, Scale, and Speed.”

Posted on August 6, 2015 by Carol McDonald

In this blog post, I’ll discuss how HBase schema is different from traditional relational schema modeling, and I’ll also provide you with some guidelines for proper HBase schema design.

Posted on July 6, 2015 by Jim Scott

In part one of this series, Drilling into Healthy Choices we explored using Drill to create Parquet tables as well as configuring Drill to read data formats that are not very standard. In part two of this series we are going to utilize this same database to think beyond traditional database design.

Posted on June 26, 2015 by Carol McDonald

Apache HBase is a database that runs on a Hadoop cluster. HBase is not a traditional RDBMS, as it relaxes the ACID (Atomicity, Consistency, Isolation, and Durability) properties of traditional RDBMS systems in order to achieve much greater scalability. Data stored in HBase also does not need to fit into a rigid schema like with an RDBMS, making it ideal for storing unstructured or semi-structured data.

Posted on May 11, 2015 by Nick Amato

In this post, I’ll show you how to build a simple real-time dashboard using Spark on MapR.

Posted on April 22, 2015 by Terry He

The following program illustrates a table load tool, which is a great utility program that can be used for batching puts into a HBase/MapR-DB table. The program creates a simple HBase table with a single column within a column family, and inserts 100,000 rows in a batch fashion.

Posted on April 14, 2015 by Kirk Borne

Data across the enterprise are typically stored in silos belonging to different business divisions and even to different projects within the same division. These silos may be further segmented by services/products and functions. Silos (which stifle data-sharing and innovation) are often identified as a primary impediment (both practically and culturally) to business progress and thus they may be the cause of numerous difficulties.

Posted on January 21, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, walks you through HBase key design with OpenTSDB. 

One of the important things to keep in mind with HBase is that it is a linearly-scaling, column-oriented key value store. Now in order to get linearly-scalable functionality out of HBase, you have to be very cognizant of the key design. This means you don't want to create what's called hot spots, and you want to prevent things like sequential writes from occurring. So what I've done is I've pre-drawn this diagram for you to show you that if you were to write sequentially, the keys, what happens in HBase is that when you're writing keys one through five, they're all going to land on the first server.

Posted on January 6, 2015 by Carol McDonald

SQL will become one of the most prolific use cases in the Hadoop ecosystem, according to Forrester Research. Apache Drill is an open source SQL query engine for big data exploration. REST services and clients have emerged as popular technologies on the Internet. Apache HBase is a hugely popular Hadoop NoSQL database. In this blog post, I will discuss combining all of these technologies: SQL, Hadoop, Drill, REST with JSON, NoSQL, and HBase, by showing how to use the Drill REST API to query HBase and Hive. I will also share a simple jQuery client that uses the Drill REST API, with JSON as the data exchange, to provide a basic user interface.

Posted on September 15, 2014 by Dale Kim

At the Big Data Everywhere conference held in Israel, Atzmon Hen-Tov, Vice President of R&D of Pontis, and Lior Schachter, Director of Cloud Technology and Platform Group Manager of Pontis, gave an informative talk titled “Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform.” The five phase, two-year migration of their operational and analytical functions to MapR resulted in a true, real-time operational analytics environment on Hadoop.

Posted on August 12, 2014 by Neeraja Rentachintala

Congratulations to the Apache Drill community on reaching a big milestone. Apache Drill 0.4.0—a developer preview—has just been released. This is the first in a series of monthly builds the project team will deliver as it drives towards Beta and GA milestones.

Let’s take a brief look at why Apache Drill matters and its key features.

Posted on May 12, 2014 by Neeraja Rentachintala

This is was origionally posted on The HIVE on May 12, 2014.

Recently I happened to observe martial arts agility training at my son’s Taekwondo school. The ability to move quickly, change direction and still be coordinated enough to throw an effective strike or kick is the key to many martial arts, including Taekwondo.

Posted on February 11, 2014 by Anoop Dawar

It gives me immense pleasure to write this blog on behalf of all of us here at MapR to announce the release of Hadoop 2.x, including YARN, on MapR. Much has been written about Hadoop 2.x and YARN and how it promises to expand Hadoop beyond MapReduce. I will give a quick summary before highlighting some of the unique benefits of Hadoop 2.x and YARN in the MapR Distribution for Hadoop.

YARN 

Posted on February 11, 2014 by Neeraja Rentachintala

Today we are very excited to announce early access of the new HP Vertica Analytics Platform on MapR at the O’Reilly Strata Conference: Making Data Work. This solution tightly integrates HP Vertica’s high-performance analytic platform directly on the MapR Enterprise-Grade Distribution for Hadoop with no “connectors” required. We wanted to provide some additional details on this integration and why this is important for customers.

Posted on May 9, 2013 by Nitin Bandugula
NoSQL databases are becoming increasingly popular for analyzing big data. There are very few NoSQL solutions, however, that provide the combination of scalability, reliability and data consistency required in a mission-critical application.

Posted on November 13, 2012 by Nitin Bandugula

Apache HBase is a NoSQL database solution for large key-value based data sets that provides scale and strong consistency, combined with MapReduce functionality over Hadoop. About half of Hadoop users today deploy Apache HBase for their NoSQL operations.

Posted on October 18, 2012 by Aditya Kishore

Running a large HBase™ cluster smoothly with minimum downtime is a skill which requires a deep understanding of how HBase™ works. When a disaster strikes, you find yourself digging into HBase™ code and/or mailing lists to understand what went wrong, determine how to recover from the current mess and most importantly figure out what can be done to prevent the same thing from happening again. Apart from the inconvenience downtime, a service crash can also lead to inconsistencies in HBase™ meta tables.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free