Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc.
Partners Blog Posts
In my first post, I showed how you might quickly deploy a Drill-enabled cluster to the Azure cloud using the MapR template available in the Azure Marketplace. In my next post, I showed you how you might get that Drill-enabled cluster to query an Azure Storage account as well as an Azure SQL Database. In this post, I want to focus on using this cluster as a data source with Power BI, a data discovery tool that’s popular with users of Microsoft technologies.
In my last post, I deployed a MapR cluster to the Azure cloud using the template available through the Azure Marketplace. My goal in doing this was to get a Drill-enabled cluster up and going in Azure as quickly as possible. My emphasis on Azure indicates that I am probably making use of the Microsoft cloud for a broader range of activities than just running this one cluster.
MapR has worked closely with Azure to develop sandboxes that enable users to do a proof of concept with the MapR Converged Data Platform. These sandboxes, which are pre-loaded and preconfigured with the MapR software and the required supporting operating system, can be launched on the Azure Marketplace portal.
If you’ve been keeping tabs on all the great product enhancements that have been coming out of MapR, you will know that the 5.2 version of the MapR Converged Data Platform went GA this summer. It takes a few cycles to make the platform available on the AWS marketplace, largely due to the testing efforts required.
A very common use case for the MapR Converged Data Platform is collecting and analyzing data from a variety of sources, including traditional relational databases. Until recently, data engineers would build an ETL pipeline that periodically walks the relational database and loads the data into files on the MapR cluster, then perform batch analytics on that data.
In the wide column data model of MapR-DB, all rows are stored by a row key, column family, column qualifier, value, and timestamps. In the current version, the row key is the only field that is indexed, which fits the common pattern of queries based on the row key.
This blog describes how to get an instance of the MapR-DB Document Database Developer Preview image running on Amazon AWS using one of the pre-configured AMI images supplied by MapR. With this AMI, you can start writing JSON-based applications on MapR-DB using the open source Open JSON Application Interface, or OJAI.
With the advent of container technology like Docker and application resource management platforms such as Apache Mesos, enterprise customers are looking at these technologies very seriously as they promise much shorter development cycles and highly scalable product deployment.
Teradata Connector for Hadoop (TDCH) is a key component of Teradata’s Unified Data Architecture for moving data between Teradata and Hadoop. TDCH invokes a mapreduce job on the Hadoop cluster to push/pull data to/from Teradata databases, with each mapper moving a portion of the data, in parallel across all nodes, for very fast transfers.
I’m very pleased to announce the release of a custom EMR bootstrap action to deploy Apache Drill on a MapR cluster. MapR is the only commercial Hadoop distribution available for Amazon’s Elastic MapReduce service (EMR), and this addition allows EMR users to easily deploy and evaluate the powerful Drill query engine.
Did you know you can run Apache Drill on your laptop? This is great news for business analysts who need to explore complex and semi-structured data. Let's look at a particular example.
The folks over at the Transaction Processing Performance Council (TPC) have been busy. The TPC benchmarks (such as TPC-C, TPC-D and TPC-H) are the industry standard for benchmarking transaction processing systems that touch upon a broad range of our daily lives, from tracking customer orders and optimizing inventory in warehouses, to supporting critical, real-time business decisions. These benchmarks are the standard by which these types of systems have been measured since their initial release in 1992, and they have been a key factor in research, innovation and performance improvements in relational database systems.
Nearly one year ago the Hadoop community began to embrace Apache Spark as a powerful batch processing engine. Today, many organizations and projects are augmenting their Hadoop capabilities with Spark. As part of this trend, the Apache Hive community is working to add Spark as an execution engine for Hive. The Hive-on-Spark work is being tracked by HIVE-7292 which is one of the most popular JIRAs in the Hadoop ecosystem. Furthermore, three weeks ago, the Hive-on-Spark team offered the first demo of Hive on Spark.
I often get asked, “What is the easiest way to get hands-on experience with MapR?” The best way is to try the MapR Sandbox, a single-node MapR cluster that you can run on your laptop. However, Hadoop clusters are never built with just one server, and some MapR features require multiple nodes, or even multiple clusters. To get hands-on with a MapR installation that more closely resembles what you might deploy on hardware, I suggest you deploy a MapR cluster in the Amazon cloud, using the MapR Installer. This blog post will walk you through that process.
Fireworks from the July 4th holiday seem like a distant memory, but the virtual fireworks continue to spark (pun intended) within the MapR partner ecosystem. A new sandbox from Talend for the MapR Distribution, the successful launch of our App Gallery, and support of expansion for Apache Spark with partners like Databricks show great momentum with our technology partners.
Last time you bought a smartphone, what factors did you consider? You probably first evaluated the phone itself, like how well the camera could capture your kid’s special moments, or if there is enough storage to hold the full Rolling Stones collection in lossless format. You then looked at whether the services you already use and trust are supported, like the Netflix app for binging on House of Cards, or your bank’s app for catching up on bills at the end of the month. In the end, you chose the phone that had both the features and compatibility you needed.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!