Na Yang is a staff software engineer at MapR and Apache Hive contributor. Prior to MapR, Na held numerous software development roles at information technology companies including Ariba, Quova, Merced Systems, most recently as a staff software engineer in the Java Infrastructure team of PayPal. Na received both MS and BS in Computer Science from Fudan University in China, and also holds a MS in Computer Engineering from San Jose State University.
Hive has been using ZooKeeper as distributed lock manager to support concurrency in HiveServer2. The ZooKeeper-based lock manager works fine in a small scale environment. However, as more and more users move to HiveServer2 from HiveServer and start to create a large number of concurrent sessions, problems can arise. The major problem is that the number of open connections between Hiveserver2 and ZooKeeper keeps rising until the connection limit is hit from the ZooKeeper server side. At that point, ZooKeeper starts rejecting new connections, and all ZooKeeper-dependent flows become unusable...
Nearly one year ago the Hadoop community began to embrace Apache Spark as a powerful batch processing engine. Today, many organizations and projects are augmenting their Hadoop capabilities with Spark. As part of this trend, the Apache Hive community is working to add Spark as an execution engine for Hive. The Hive-on-Spark work is being tracked by HIVE-7292 which is one of the most popular JIRAs in the Hadoop ecosystem. Furthermore, three weeks ago, the Hive-on-Spark team offered the first demo of Hive on Spark.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!