In this week’s Whiteboard Walkthrough, Dale Kim, Sr. Director of Industry Solutions at MapR, describes how MapR addresses the challenge of providing a persistence tier to containers in big data settings. Dale describes a new technology to support Docker containers, the MapR Persistent Application Client Container, or PACC. This lets you deploy containers anywhere, with security enabled, while also providing access to the MapR Converged Data Platform, which includes NoSQL database and streaming message transport as well as files for the persistence layer.
The MapR Persistent Application Client Container (PACC) is a Docker-based container image that includes a container-optimized MapR client. The PACC provides secure access to MapR Converged Data Platform services, including MapR-FS, MapR-DB, and MapR Streams. The PACC makes it fast and easy to run containerized applications that access data in MapR.
In this blog we will discuss some patterns which are often used in microservices applications which need to scale: Event Stream Event Sourcing, Polyglot Persistence, Memory Image, Command Query Responsibility Separation
In this blog I'm going to show you how to configure MapR Streams (aka Kafka) on Oracle Data Integrator with Spark Streaming to create a true lambda architecture: a fast layer complementing the batch and serving layer. I've done this on MapR so I can do a "two birds one stone"; showing you MapR Streams steps and Kafka.
In this week’s Whiteboard Walkthrough, Ted Dunning, Chief Applications Architect at MapR, will talk about how you can use logs containing metrics and exceptions to detect anomalies in the behavior of a micro-service.
Recent reports suggest hackers are actively compromising insecure Hadoop deployments. These attacks appear to be targeting a service port (NameNode: 50070) that is not used by MapR, making instances of MapR not susceptible to this specific exploit. The underlying attack methodology, however, is simply to find and exploit internet-accessible ports that do not require authentication.
In this blog, we describe how to use the Kafka REST Proxy to publish and consume messages to/from MapR Streams. The REST Proxy is a great addition to the MapR Converged Data Platform, allowing any programming language to use MapR Streams. The Kafka REST Proxy, provided with the MapR Streams tools, can be used with MapR Streams (default) as well as Apache Kafka (in a hybrid mode). In this article, we will focus on MapR Streams.
If you want to try out the MapR Converged Data Platform to see its unique big data capabilities but don’t have a cluster of hardware immediately available, you still have a few other options. For example, you can spin up a MapR cluster in the cloud using multiple node instances on one of our IaaS partners (Amazon, Azure, etc.).
There has been a lot of research in document image processing over the past 20 years, but not much research has been done in terms of parallel processing. Some of the solutions proposed for parallel processing have been to create threads of execution for each image, or to use GNU Parallel.
Pentaho Data Integration (PDI) provides the ETL capabilities that facilitate the process of capturing, cleansing, and storing data. Its uniform and consistent format makes it accessible and relevant to end-users and IoT technologies. Apache Drill is a schema-free SQL-on-Hadoop engine that lets you run SQL queries against different data sets with various formats, e.g. JSON, CSV, Parquet, HBase, etc.