We’re pleased to announce the general release of the MapR Ecosystem Pack (MEP) version 2.0. This represents the second major release of a MapR Ecosystem Pack since the beginning of this new process of delivering ecosystem upgrades.
If you’re new to this process, MapR Ecosystem Packs (MEPs) are a way to deliver ecosystem upgrades decoupled from core upgrades - allowing you to upgrade your tooling independently of your Converged Data Platform.
Each MEP contains a subset of the greater supported Hadoop ecosystem certified to work fully and completely with the other components in each release. This allows us to provide more nuanced support like bundling Spark JARs into Oozie, for example.
Each MapR core release will be supported by multiple MEPs, giving you multiple opportunities to upgrade.
For more information, please see the Whiteboard Walkthrough here.
So, what’s in a MEP, and more specifically, MEP 2.0?
While MEP started as a process to deliver ecosystem projects, it’s been so popular with our customers that we’ve begun to bundle in common Hadoop connectors and APIs, like Kafka Connect for MapR Streams.
The full list of MEP 2.0 content can be found here, but we’ll highlight some of the key upgrades and additions below:
Apache Spark 2.0.1 GA for MapR
Key upgrades provided by this are as follows:
- Structured APIs
- Can run database-like operations on the same engine as Spark SQL or allow for passing in custom code.
- Structured Streaming with Spark SQL Streaming
- Provides the ability to perform interactive queries against live streaming data.
- Spark as a Compiler
- Whole-stage code generation provided by the second-generation Tungsten engine flattens SQL queries into a single function evaluated as bytecode at runtime
Apache Drill 1.9
Drill 1.9 is an iterative release form Apache Drill community and now available on MapR with the MEP 2.0 release. Drill has been making significant strides in terms of product and user adoption since its GA in May’16 and the recent release takes it to the next milestone, with a variety of enhancements around performance, ease of use, and seamless SQL/BI tool integration.
The key highlights of this release include:
- Enhanced Parquet performance - Improved query performance for I/O intensive analytic queries using an optimized Parquet reader as well as significant performance boosts for targeted queries by reducing I/O via Parquet filter pushdown and Limit operator pushdown. These techniques complement the variety of the other Drill optimizations, including partition pruning and metadata caching to further enhance the performance.
- Flexible & dynamic UDFs - Enables data scientists, analysts, and developers to develop and deploy custom Drill SQL functions (UDFs) in a self-service fashion without having to restart Drill services in the cluster or require IT involvement. This feature is greatly useful in lage multi-tenant organizations where restarting Drill services is disruptive to users as well as empowering users to get fast value from data using Apache Drill
- Seamless BI tool integration - Drill in this release introduces a variety of SQL improvements to enable optimal BI tool integration. This includes support for variety of join syntax generated from Tableau and other BI tools as well as improvements to the number of the queries generated for metadata from the BI tools, thereby improving the overall interactive user experience.
More information on the release and features can be found here.
(New!) Kafka Connect and Kafka REST Proxy for MapR Streams
Kafka Connect for MapR Streams is a new way to easily connect common data systems with Kafka by providing prebuilt connectors for legacy and modern data stores.
Kafka REST Proxy for MapR Streams provides the ability for any device that can communicate using HTTP to easily publish/subscribe to Kafka topics.
(New!) MapR Installer Stanzas
MapR launched the MapR Installer last year to provide cluster operators an intuitive way to set up a MapR cluster using a step-by-step wizard. To expand on this, within the Spyglass Initiative, we are launching MapR Installer Stanzas to enable API-driven installation. These provide the ability to build a “stanza” which contains layout and settings for a cluster installation that can be passed programmatically to the installer.
(New!) Teradata Connector for MapR (powered by Teradata Connector for Hadoop)
We’re introducing, in partnership with Teradata, the Teradata Connector for MapR, a MapR implementation of the Teradata Connector For Hadoop (TDCH). This is a Sqoop wrapper, built into MapR Sqoop that facilitates bulk data transfer between Hadoop and external data storage.
MEP 2.0 contains an upgrade to Hue with the following key improvements:
- Oozie Improvements
- External Workflow Graph
- Single Action Execution
- New Ability: Dryrun Oozie job
- New SQL Query editor works over JDBC
- Look for an upcoming MapR Community post on how to use this with Apache Drill!
- Directory and File-based Document Management
- Users can create their own directories and subdirectories and drag and drop documents within the simple filebrowser interface