At MapR, we have set up multiple clusters for several of our enterprise customers, and we have brought that knowledge and best practices to MapR Installer. Increasingly, these deployments have not only grown in number, but have also evolved based on the type, purpose, and lifetime for these clusters.
Automatic replication of MapR-DB data to Elasticsearch is useful for many environments, and I want to share information about a specific customer deployment I worked on recently. Their use case is related to log security analytics and is centered around using Drill for running interactive queries on aggregated data.
One of the challenges when working with streams is the transitory nature of their data. Many applications require data to be persisted far beyond the point at which said data has any practical value to streaming analytics.
Druid is a high-performance, column-oriented, distributed data store. Druid supports streaming data ingestion and offers insights on events immediately after they occur. Druid can ingest data from multiple data sources, including Apache Kafka.
This article will guide you into the steps to use Apache Flink with MapR Streams. MapR Streams is a distributed messaging system for streaming event data at scale, and it’s integrated into the MapR Converged Data Platform, based on the Apache Kafka API (0.9.0)
Sooner or later, if you eyeball enough data sets, you will encounter some that look like a graph, or are best represented a graph. Whether it's social media, computer networks, or interactions between machines, graph representations are often a straightforward choice for representing relationships among one or more entities.
This post will help you get started using Apache Spark GraphX with Scala on the MapR Sandbox. GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of mathematics called graph theory. It is a distributed graph processing framework that sits on top of the Spark core.
What is a time series? A time series is a sequence of data points which are ordered in time. Time series data can come in multiple shapes, and can be used in many facets of everyday life, such as measuring rainfall, earthquake activity, or even stock prices. With the growth of the Internet of Things, the volume of time series data you can collect is staggering - reaching 100 million data points per second.
We previously discussed the “Top 8 Reasons that Characterization is Right for Your Data.” Here we move the discussion of characterization from the theoretical to the practical, by providing four simple examples of characterizations of data. In each of these cases, the set of characterizations that are generated can then be fed into different types of analytics algorithms for discovery from your data: predictive patterns, clusters (segments), associations, correlations, trends, and anomalies (outliers, surprises).