Druid is a high-performance, column-oriented, distributed data store. Druid supports streaming data ingestion and offers insights on events immediately after they occur. Druid can ingest data from multiple data sources, including Apache Kafka.
This article will guide you into the steps to use Apache Flink with MapR Streams. MapR Streams is a distributed messaging system for streaming event data at scale, and it’s integrated into the MapR Converged Data Platform, based on the Apache Kafka API (0.9.0)
Sooner or later, if you eyeball enough data sets, you will encounter some that look like a graph, or are best represented a graph. Whether it's social media, computer networks, or interactions between machines, graph representations are often a straightforward choice for representing relationships among one or more entities.
This post will help you get started using Apache Spark GraphX with Scala on the MapR Sandbox. GraphX is the Apache Spark component for graph-parallel computations, built upon a branch of mathematics called graph theory. It is a distributed graph processing framework that sits on top of the Spark core.
What is a time series? A time series is a sequence of data points which are ordered in time. Time series data can come in multiple shapes, and can be used in many facets of everyday life, such as measuring rainfall, earthquake activity, or even stock prices. With the growth of the Internet of Things, the volume of time series data you can collect is staggering - reaching 100 million data points per second.
We previously discussed the “Top 8 Reasons that Characterization is Right for Your Data.” Here we move the discussion of characterization from the theoretical to the practical, by providing four simple examples of characterizations of data. In each of these cases, the set of characterizations that are generated can then be fed into different types of analytics algorithms for discovery from your data: predictive patterns, clusters (segments), associations, correlations, trends, and anomalies (outliers, surprises).
Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and there’s been plenty of hype about it in the past several months. In the latest webinar from the Data Science Central webinar series, titled “Let Spark Fly: Advantages and Use Cases for Spark on Hadoop,” we cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal.
The newly launched Developer Central is a place just for the developer community. Full of code samples and best practices, Developer Central will help you get started on Hadoop and manage your clusters efficiently. The three core content areas of Developer Central are Code, Architecture and Resources.