If you want to try out the MapR Converged Data Platform to see its unique big data capabilities but don’t have a cluster of hardware immediately available, you still have a few other options. For example, you can spin up a MapR cluster in the cloud using multiple node instances on one of our IaaS partners (Amazon, Azure, etc.).
There has been a lot of research in document image processing over the past 20 years, but not much research has been done in terms of parallel processing. Some of the solutions proposed for parallel processing have been to create threads of execution for each image, or to use GNU Parallel.
Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc.
In my first post, I showed how you might quickly deploy a Drill-enabled cluster to the Azure cloud using the MapR template available in the Azure Marketplace. In my next post, I showed you how you might get that Drill-enabled cluster to query an Azure Storage account as well as an Azure SQL Database. In this post, I want to focus on using this cluster as a data source with Power BI, a data discovery tool that’s popular with users of Microsoft technologies.
In my last post, I deployed a MapR cluster to the Azure cloud using the template available through the Azure Marketplace. My goal in doing this was to get a Drill-enabled cluster up and going in Azure as quickly as possible. My emphasis on Azure indicates that I am probably making use of the Microsoft cloud for a broader range of activities than just running this one cluster.
At MapR, we have set up multiple clusters for several of our enterprise customers, and we have brought that knowledge and best practices to MapR Installer. Increasingly, these deployments have not only grown in number, but have also evolved based on the type, purpose, and lifetime for these clusters.
Automatic replication of MapR-DB data to Elasticsearch is useful for many environments, and I want to share information about a specific customer deployment I worked on recently. Their use case is related to log security analytics and is centered around using Drill for running interactive queries on aggregated data.
One of the challenges when working with streams is the transitory nature of their data. Many applications require data to be persisted far beyond the point at which said data has any practical value to streaming analytics.
Druid is a high-performance, column-oriented, distributed data store. Druid supports streaming data ingestion and offers insights on events immediately after they occur. Druid can ingest data from multiple data sources, including Apache Kafka.
This article will guide you into the steps to use Apache Flink with MapR Streams. MapR Streams is a distributed messaging system for streaming event data at scale, and it’s integrated into the MapR Converged Data Platform, based on the Apache Kafka API (0.9.0)