Apache Hive Blog Posts

Posted on January 17, 2017 by Mathieu Dumoulin

Debugging a real-life distributed application can be a pretty daunting task. Most common Google searches don't turn out to be very useful, at least at first. In this blog post, I will give a fairly detailed account of how we managed to accelerate by almost 10x an Apache Kafka/Spark Streaming/Apache Ignite application and turn a development prototype into a useful, stable streaming application that eventually exceeded the performance goals set for the application.

Posted on March 24, 2016 by Arun Nallusamy

In a typical Hive installation with metadata in a MySQL configuration, a password is configured in a configuration file in clear text. This presents a few risks: 1) Unauthorized access could destroy/modify Hive metadata and disrupt workflows. A malicious user could alter Hive permissions or damage metadata.

Posted on January 25, 2016 by Ranjit Lingaiah

In the wide column data model of MapR-DB, all rows are stored by a row key, column family, column qualifier, value, and timestamps. In the current version, the row key is the only field that is indexed, which fits the common pattern of queries based on the row key.

Posted on October 21, 2015 by Neeraja Rentachintala

In this blog post, I would like to briefly introduce the new analytics capabilities added to Drill namely ANSI SQL compliant Analytic and Window functions and how to get started with these.

Posted on August 31, 2015 by Carol McDonald

In this blog post, we’ll take a look at the inner workings of Apache Drill, learn what services are involved, and find out what happens in Apache Drill when we submit a query.

Posted on July 28, 2015 by Tugdual Grall

Apache Drill allows users to explore any type of data using ANSI SQL. This is great, but Drill goes even further than that and allows you to create custom functions to extend the query engine. These custom functions have all the performance of any of the Drill primitive operations, but allowing that performance makes writing these functions a little trickier than you might expect.

Posted on July 20, 2015 by Hao Zhu

This article describes the new Hive transaction feature introduced in Hive 1.0. This new feature adds initial support of the 4 traits of database transactions – atomicity, consistency, isolation and durability at the row level. With this new feature, you can add new rows in Hive while another application reads rows from the same partition without interference.

Posted on July 6, 2015 by Jim Scott

In part one of this series, Drilling into Healthy Choices we explored using Drill to create Parquet tables as well as configuring Drill to read data formats that are not very standard. In part two of this series we are going to utilize this same database to think beyond traditional database design.

Posted on May 26, 2015 by Nitin Bandugula

This is the third and final entry in our three-part series focused on building basic skill sets for use in data analysis. The series is aimed at those who have some familiarity with using SQL to query data but limited or no experience with Apache Drill.

Posted on May 19, 2015 by Neeraja Rentachintala

Today, we are extremely excited and proud to announce the general availability (GA) of Apache Drill 1.0, as part of the MapR Distribution. Congratulations to the Drill community on this significant milestone and achievement!

Posted on May 18, 2015 by Nitin Bandugula

This is the second in our three-part series focused on building basic skill sets for use in data analysis. The material is intended for those who have no prior, or very limited, experience with Apache Drill, but do have some familiarity with running SQL queries.

Posted on May 13, 2015 by Tomer Shiran

In this week's Whiteboard Walkthrough, Tomer Shiran, PMC member and Apache Drill committer, walks you through the history of the non-relational datastore and why Apache Drill is so important for this type of technology.

Posted on May 4, 2015 by Neeraja Rentachintala

Today, the Apache Drill community announced the release of Drill 0.9, and MapR is very excited to package this release as part of the MapR Distribution including Hadoop.

Posted on April 14, 2015 by Kirk Borne

Data across the enterprise are typically stored in silos belonging to different business divisions and even to different projects within the same division. These silos may be further segmented by services/products and functions. Silos (which stifle data-sharing and innovation) are often identified as a primary impediment (both practically and culturally) to business progress and thus they may be the cause of numerous difficulties.

Posted on April 2, 2015 by Neeraja Rentachintala

Since its Beta release in September '14, Apache Drill, the most flexible SQL-on-Hadoop technology, is making great strides in terms of the product progress as well as the community adoption. With four significant iterative releases (0.5, 0.6, 0.7, 0.8) in less than six months, thousands of downloads from the MapR website, nearly 1500 message threads in the Apache Drill user email alias, and an active open source community, Drill is well on its way to becoming generally available in the Q2 '15 time frame.

Posted on February 6, 2015 by Na Yang

Hive has been using ZooKeeper as distributed lock manager to support concurrency in HiveServer2. The ZooKeeper-based lock manager works fine in a small scale environment. However, as more and more users move to HiveServer2 from HiveServer and start to create a large number of concurrent sessions, problems can arise. The major problem is that the number of open connections between Hiveserver2 and ZooKeeper keeps rising until the connection limit is hit from the ZooKeeper server side. At that point, ZooKeeper starts rejecting new connections, and all ZooKeeper-dependent flows become unusable.

Posted on January 6, 2015 by Carol McDonald

SQL will become one of the most prolific use cases in the Hadoop ecosystem, according to Forrester Research. Apache Drill is an open source SQL query engine for big data exploration. REST services and clients have emerged as popular technologies on the Internet. Apache HBase is a hugely popular Hadoop NoSQL database. In this blog post, I will discuss combining all of these technologies: SQL, Hadoop, Drill, REST with JSON, NoSQL, and HBase, by showing how to use the Drill REST API to query HBase and Hive. I will also share a simple jQuery client that uses the Drill REST API, with JSON as the data exchange, to provide a basic user interface.

Posted on December 22, 2014 by Jim Bates

Over the last few releases, the options for how you store data in Hive has advanced in many ways. In this post, let’s take a look at how to go about determining what Hive table storage format would be best for the data you are using. Starting with a basic table, we’ll look at creating duplicate tables for each of the storage format options, and then comparing queries and data compression. Just keep in mind that the goal of this post is to talk about ways of comparing table formats and compression options, and not define the fastest Hive setup for all things data. After all, the fun is in figuring out the Hive table storage format for your own Hive project, and not just reading about mine.

Posted on December 16, 2014 by Na Yang

Nearly one year ago the Hadoop community began to embrace Apache Spark as a powerful batch processing engine. Today, many organizations and projects are augmenting their Hadoop capabilities with Spark. As part of this trend, the Apache Hive community is working to add Spark as an execution engine for Hive. The Hive-on-Spark work is being tracked by HIVE-7292 which is one of the most popular JIRAs in the Hadoop ecosystem. Furthermore, three weeks ago, the Hive-on-Spark team offered the first demo of Hive on Spark.

Posted on December 11, 2014 by Jim Bates

There are many great examples out there for using the Hive shell, as well as examples of ways to automate many of the animals in our Hadoop zoo. However, if you’re just getting started, or need something fast that won’t stay around long, then all you need to do is throw a few lines of code together with some existing programs in order to avoid re-inventing the workflow. In this blog post, I’ll share a few quick tips on using the Hive shell inside scripts. We’ll take a look at a simple script that needs to pull an item or count, and then look at two ways to use the Hive shell to get an answer.

Posted on December 1, 2014 by Nitin Bandugula

The November release of the Apache open source packages in MapR was made available for customers earlier this month. We are excited to deliver some major upgrades to existing packages.

Here are the highlights:

Posted on November 13, 2014 by Neeraja Rentachintala

Apache Drill is one of the fastest growing open source projects, with the community making rapid progress with monthly releases. The latest release of Drill 0.6 is another important milestone for the project and builds on the product with key enhancements, including the ability to do SQL queries directly on MongoDB (along with file system, HBase, and Hive sources that are already supported today), as well as a number of performance and SQL improvements.

Posted on September 28, 2014 by Karen Whipple

The recent MapR webinar titled “The Future of Hadoop Analytics: Total Data Warehouses and Self-Service Data Exploration” proved to be a highly informative, in-depth look at the future of data warehouses and how SQL-on-Hadoop technologies will play a pivotal role in those settings. Matt Aslett, Research Director for 451 Research, along with Apache Drill architect Jacques Nadeau, discussed what lies ahead for enterprise data warehouse architects and BI users in 2015 and beyond.

Posted on September 21, 2014 by Suhas Satish

While big data security analytics promises to deliver great insights in the battle against cyber threats, the concept and the tools are still maturing. In this blog, I’ll simplify the topic of adopting security in Hadoop by showing you how to encrypt traffic between Hue and Hive.

Posted on September 16, 2014 by Neeraja Rentachintala

Since Apache Drill 0.4 was released in August for experimentation on the MapR Distribution, there has been tremendous interest in the customer and partner community on the promise and potential of Drill to unlock the new types of data in their Hadoop/NoSQL systems for interactive analysis throughout the organization. Today we're excited to announce Apache Drill 0.5.

Posted on September 15, 2014 by Dale Kim

At the Big Data Everywhere conference held in Israel, Atzmon Hen-Tov, Vice President of R&D of Pontis, and Lior Schachter, Director of Cloud Technology and Platform Group Manager of Pontis, gave an informative talk titled “Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform.” The five phase, two-year migration of their operational and analytical functions to MapR resulted in a true, real-time operational analytics environment on Hadoop.

Posted on September 8, 2014 by Kyle Porter

The MapR Distribution including Apache™ Hadoop® employs drivers from Simba Technologies to connect to client ODBC and JDBC applications allowing you to access data on MapR from tools like Tableau with ODBC or SQuirreL with JDBC. This post will walk you through the steps to set up and connect your Apache Hive instance to both an ODBC and JDBC application running on your laptop or other client machine. Although you may already have your own Hive cluster set up, this post focuses on the MapR Sandbox for Hadoop virtual machine (VM).

Posted on August 15, 2014 by Michele Nemschoff

Getting back to basics, MapR CTO and co-Founder M.C. Srivas provides a brief introduction to Hadoop, and explains where it fits on the “dumb data” to “very smart data” spectrum. After watching this video, you’ll have a better understanding of Hadoop, and how MapR has taken the best innovations from both ends of the data spectrum to develop the leading Hadoop technology for big data deployments. 

A few key points made in the video include:

Posted on July 6, 2014 by Michele Nemschoff

M.C.Srivas, CTO and Co-Founder of MapR Technologies recently spoke at the Munich Hadoop User Group about the Apache Drill project.  The following is a blog from HUG Muenchen originally published on the comSysto blog.


A deep dive into Apache Drill - fast interactive SQL on Hadoop

Posted on June 20, 2014 by Nitin Bandugula

The latest monthly release of the Apache open source packages in MapR is now available for customers. The release includes updates to several OSS packages including Hive, HBase, Oozie, Hue and Sqoop. Here are some of the highlights of the release:

Posted on May 14, 2014 by Patrick Toole

With our recent announcement of HP Vertica’s deployment onto MapR, we have already been flooded with questions about the integration.

Use Cases

Posted on May 12, 2014 by Neeraja Rentachintala

This is was origionally posted on The HIVE on May 12, 2014.

Recently I happened to observe martial arts agility training at my son’s Taekwondo school. The ability to move quickly, change direction and still be coordinated enough to throw an effective strike or kick is the key to many martial arts, including Taekwondo.

Posted on May 7, 2014 by Jon Posnik

SQL-on-Hadoop just got easier this morning.  Working together with the HP Vertica team, we are excited to announce general availability of the HP Vertica Analytics Platform running on the MapR Distribution for Apache Hadoop.

Posted on April 11, 2014 by Anoop Dawar

On the heels of the recent Spark stack inclusion announcement, here is some more fresh powder (For non-skiers, that’s fresh snow on a mountain).

MapR Distribution of Apache Hadoop: 4.0.0 Beta

Posted on April 4, 2014 by Karen Whipple
Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. The latest webinar from the Amazon Web Services Partner webinar series, titled “Hadoop in the Cloud: Unlocking the Potential of Big Data on AWS,” showed examples of how to use Amazon EMR with the MapR Distribution for Apache Hadoop, and outlined the advantages of using the cloud to increase flexibility and accelerate projects while lowering costs.
Posted on April 2, 2014 by Amit Anand

In this blog I will show you how set up authentication for HiveServer2 (HS2) using pluggable authentication module (PAM). Once configured, all HS2 clients (JDBC and ODBC) will require a valid username and password to connect. A validation error will be thrown if an invalid username and password is passed. This authentication doesn’t apply to hive cli (command line interface) as it doesn’t go through HS2. Please remember that HS2 authentication only controls connection to hive and not the actual data.

Posted on February 11, 2014 by Neeraja Rentachintala

Today we are very excited to announce early access of the new HP Vertica Analytics Platform on MapR at the O’Reilly Strata Conference: Making Data Work. This solution tightly integrates HP Vertica’s high-performance analytic platform directly on the MapR Enterprise-Grade Distribution for Hadoop with no “connectors” required. We wanted to provide some additional details on this integration and why this is important for customers.

Posted on February 11, 2014 by Anoop Dawar

It gives me immense pleasure to write this blog on behalf of all of us here at MapR to announce the release of Hadoop 2.x, including YARN, on MapR. Much has been written about Hadoop 2.x and YARN and how it promises to expand Hadoop beyond MapReduce. I will give a quick summary before highlighting some of the unique benefits of Hadoop 2.x and YARN in the MapR Distribution for Hadoop.


Posted on January 6, 2014 by Michael Hausenblas
At the end of last year my colleague Steve Wooledge discussed options you have at your disposal for querying both schema-based or self-describing structured datasources with the MapR Big Data platform. Around that time I also reviewed Open Source SQL-in-Hadoop Solutions over at InfoQ.
Posted on December 2, 2013 by Abhinav Chawade

This blog explains how to achieve replication in Hive if the metastore is in MySQL. MySQL has built-in replication, which can be used in conjunction with remote mirroring to replicate Hive tables. While Hive does not have this replication capability, this can be achieved using mirror volumes in MapR.

Posted on September 3, 2013 by Yuliya Feldman
As the community releases ecosystem updates, MapR provides updates to our MapR ecosystem.

I would like to highlight our recent Hive update which adds support for SSL, PAM authentication along with ODBC driver user/password authentication support over SSL.

Posted on August 8, 2012 by Jay Elliott

ODBC has been the flagship API for SQL ever since it was first developed by Microsoft and Simba Technologies in 1992. An acronym for Open DataBase Connectivity, ODBC is the standard API used by popular applications like Excel, Crystal Reports, MicroStrategy and Tableau to connect to SQL databases.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free