One of the challenges when working with streams is the transitory nature of their data. Many applications require data to be persisted far beyond the point at which said data has any practical value to streaming analytics.
NoSQL Blog Posts
This post will show how to integrate Apache Spark Streaming, MapR-DB, and MapR Streams for fast, event-driven applications.
In 2015, MapR shipped three significant core releases : 4.0.2 in January, 4.1 in April, 5.0 and the GA version of Apache Drill in July. While all this was happening, many of my colleagues in engineering (who’ve demonstrated a whole new level of ingenuity and multitasking) were also working on one of the biggest releases in the history of MapR—the converged data platform release (AKA, MapR 5.1).
In the wide column data model of MapR-DB, all rows are stored by a row key, column family, column qualifier, value, and timestamps. In the current version, the row key is the only field that is indexed, which fits the common pattern of queries based on the row key.
This blog describes how to get an instance of the MapR-DB Document Database Developer Preview image running on Amazon AWS using one of the pre-configured AMI images supplied by MapR. With this AMI, you can start writing JSON-based applications on MapR-DB using the open source Open JSON Application Interface, or OJAI.
In this week's Whiteboard Walkthrough, Aditya Kishore, engineer on the MapR-DB team, explains how to use the OJAI API to insert, search, and update the document database.
MapR developed OJAI (the Open JSON Application Interface) which provides native integration of JSON-like document processing in Hadoop-style scale-out clusters.
In this week's Whiteboard Walkthrough, MC Srivas, MapR CTO and Co-Founder, explains the innovation and vision behind MapR-DB and how project Kudu stacks up to the MapR Data Platform.
In this week's Whiteboard Walkthrough, Anurag Choudhary, Engineer on the MapR-DB team, explains how horizontal scaling in MapR-DB works and how hot spotting is automatically avoided.
In this blog post, I’ll give you an in-depth look at the HBase architecture and its main benefits over NoSQL data store solutions. Be sure and read the first blog post in this series, titled “HBase and MapR-DB: Designed for Distribution, Scale, and Speed.”
In part one of this series, Drilling into Healthy Choices we explored using Drill to create Parquet tables as well as configuring Drill to read data formats that are not very standard. In part two of this series we are going to utilize this same database to think beyond traditional database design.
Apache HBase is a database that runs on a Hadoop cluster. HBase is not a traditional RDBMS, as it relaxes the ACID (Atomicity, Consistency, Isolation, and Durability) properties of traditional RDBMS systems in order to achieve much greater scalability. Data stored in HBase also does not need to fit into a rigid schema like with an RDBMS, making it ideal for storing unstructured or semi-structured data.
In this post, I’ll show you how to build a simple real-time dashboard using Spark on MapR.
The following program illustrates a table load tool, which is a great utility program that can be used for batching puts into a HBase/MapR-DB table. The program creates a simple HBase table with a single column within a column family, and inserts 100,000 rows in a batch fashion.
Data across the enterprise are typically stored in silos belonging to different business divisions and even to different projects within the same division. These silos may be further segmented by services/products and functions. Silos (which stifle data-sharing and innovation) are often identified as a primary impediment (both practically and culturally) to business progress and thus they may be the cause of numerous difficulties.
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, walks you through HBase key design with OpenTSDB.
One of the important things to keep in mind with HBase is that it is a linearly-scaling, column-oriented key value store. Now in order to get linearly-scalable functionality out of HBase, you have to be very cognizant of the key design. This means you don't want to create what's called hot spots, and you want to prevent things like sequential writes from occurring. So what I've done is I've pre-drawn this diagram for you to show you that if you were to write sequentially, the keys, what happens in HBase is that when you're writing keys one through five, they're all going to land on the first server.
SQL will become one of the most prolific use cases in the Hadoop ecosystem, according to Forrester Research. Apache Drill is an open source SQL query engine for big data exploration. REST services and clients have emerged as popular technologies on the Internet. Apache HBase is a hugely popular Hadoop NoSQL database. In this blog post, I will discuss combining all of these technologies: SQL, Hadoop, Drill, REST with JSON, NoSQL, and HBase, by showing how to use the Drill REST API to query HBase and Hive. I will also share a simple jQuery client that uses the Drill REST API, with JSON as the data exchange, to provide a basic user interface.
At the Big Data Everywhere conference held in Israel, Atzmon Hen-Tov, Vice President of R&D of Pontis, and Lior Schachter, Director of Cloud Technology and Platform Group Manager of Pontis, gave an informative talk titled “Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform.” The five phase, two-year migration of their operational and analytical functions to MapR resulted in a true, real-time operational analytics environment on Hadoop.
Congratulations to the Apache Drill community on reaching a big milestone. Apache Drill 0.4.0—a developer preview—has just been released. This is the first in a series of monthly builds the project team will deliver as it drives towards Beta and GA milestones.
Let’s take a brief look at why Apache Drill matters and its key features.
This is was origionally posted on The HIVE on May 12, 2014.
Recently I happened to observe martial arts agility training at my son’s Taekwondo school. The ability to move quickly, change direction and still be coordinated enough to throw an effective strike or kick is the key to many martial arts, including Taekwondo.
Today we are very excited to announce early access of the new HP Vertica Analytics Platform on MapR at the O’Reilly Strata Conference: Making Data Work. This solution tightly integrates HP Vertica’s high-performance analytic platform directly on the MapR Enterprise-Grade Distribution for Hadoop with no “connectors” required. We wanted to provide some additional details on this integration and why this is important for customers.
It gives me immense pleasure to write this blog on behalf of all of us here at MapR to announce the release of Hadoop 2.x, including YARN, on MapR. Much has been written about Hadoop 2.x and YARN and how it promises to expand Hadoop beyond MapReduce. I will give a quick summary before highlighting some of the unique benefits of Hadoop 2.x and YARN in the MapR Distribution for Hadoop.
Running a large HBase™ cluster smoothly with minimum downtime is a skill which requires a deep understanding of how HBase™ works. When a disaster strikes, you find yourself digging into HBase™ code and/or mailing lists to understand what went wrong, determine how to recover from the current mess and most importantly figure out what can be done to prevent the same thing from happening again. Apart from the inconvenience downtime, a service crash can also lead to inconsistencies in HBase™ meta tables.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!