In this week’s Whiteboard Walkthrough, Rachel Silver, Ecosystem Product Manager at MapR, talks about MapR Ecosystem Packs or MEPs that give you a convenient way to upgrade open source ecosystem components without having to upgrade the core MapR platform. The open source components in MEPs have been tested to be functionally interoperable within the MEP so that you can spend more time processing/analyzing data and less time troubleshooting your stack.
Whiteboard Walkthrough Videos Blog Posts
In this week's Whiteboard Walkthrough video, Neeraja Rentachintala, Senior Director of Product Management at MapR Technologies, explains how Apache Drill optimization achieves interactive performance for low latency SQL queries on very large data sets when working with familiar BI tools such as Tableau, Microstrategy or Qlikview and includes techniques used for successful optimization using Drill in production. Neeraja describes Drill optimization capabilities based on Apache Calcite that include projection pruning, filter push down, partition pruning, cost-based optimization and meta-data caching.
In this week's Whiteboard Walkthrough Jorge Geronimo, Solutions Architect at MapR, explains how with a single line of code you can create a replica of a MapR data stream within the same cluster or to another cluster even in another part of the world. Jorge also describes multi master replication for streaming data and how MapR Streams' unique capability for geo-distributed replication with preserved offsets offers advantages for working with streaming data.
In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet.
In this week's whiteboard walkthrough, Nick Amato, Director Technical Marketing at MapR, explains the advantages of a publish-subscribe model for real-time data streams.
In this week's Whiteboard Walkthrough, Santosh Marella, committer on the Apache Myriad project, explains how Apache Myriad enables fine-grained scaling in Mesos environments alongside YARN, the resource management framework for Apache Hadoop.
In this week's Whiteboard Walkthrough, Bharat Baddepudi, engineer on the MapR-DB team, explains how documents in MapR-DB are inserted and updated.
In this week's Whiteboard Walkthrough, MC Srivas, MapR CTO and Co-Founder, explains the innovation and vision behind MapR-DB and how project Kudu stacks up to the MapR Data Platform.
In this week's Whiteboard Walkthrough, Ted Dunning, Chief Application Architect at MapR, talks about the architectural differences between HDFS and MapR-FS that boil down to three numbers.
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, gives you an introduction to the Zeta Architecture, a high-level enterprise architectural construct which enables simplified business processes and defines a scalable way to increase the speed of integrating data into the business.
In this week's Whiteboard Walkthrough, Anoop Dawar, Senior Product Director at MapR, shows you the basics of Apache Spark and how it is different from MapReduce.
In this week's Whiteboard Walkthrough, Tomer Shiran, PMC member and Apache Drill committer, walks you through the history of the non-relational datastore and why Apache Drill is so important for this type of technology.
In this week's Whiteboard Walkthrough, Ted Dunning, Chief Application Architect at MapR, gets you up to speed on the t-digest, an algorithm you can add to any anomaly detector to set the number of alarms that you get as a percentage of the total samples.
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, explains the differences between Apache Mesos and YARN, and why one may or may not be better in global resource management than the other.
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, talks about the implications of append-only file systems and the impact they have on downstream projects in the Hadoop ecosystem. He starts off by demonstrating this concept using HBase, and how it has forced HBase to have to consider certain implications on the functionality of a real-time capable data store.
In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, walks you through HBase key design with OpenTSDB.
One of the important things to keep in mind with HBase is that it is a linearly-scaling, column-oriented key value store. Now in order to get linearly-scalable functionality out of HBase, you have to be very cognizant of the key design. This means you don't want to create what's called hot spots, and you want to prevent things like sequential writes from occurring. So what I've done is I've pre-drawn this diagram for you to show you that if you were to write sequentially, the keys, what happens in HBase is that when you're writing keys one through five, they're all going to land on the first server.
In this week's Whiteboard Walkthrough, Abizer Adenwala, Technical Support Engineer at MapR, walks you through what a storage pool is, why disks are striped, reasons disk would be marked as failed, what happens when a disk is marked failed, what to watch out for before reformatting/re-adding disk back, and what is the best path to recover from disk failure.
In this week's Whiteboard Walkthrough, James Casaletto walks you through how to configure the network for the MapR Hadoop Sandbox. Whether you use VirtualBox, VMware Fusion, VMware Player, or pretty much any hypervisor on your laptop to support your MapR Sandbox, you'll need to configure the network. There's essentially three different settings that you can use to configure the network for your Sandbox. One is NAT, one is host-only, and one is bridged.
Welcome to the MapR Whiteboard Walkthrough. My name is John and I'm the author of the cluster administration course that you'll find at training.mapr.com. I'm here to talk to you a about the CLDB, or the container location database. Over the next few minutes, I'll give you a quick definition of the CLDB, an overview, and talk a little bit more about what's inside the CLDB. At the end of this video you should be able to define the function of the CLDB in a MapR cluster and also describe how it differs from the namenode in standard Hadoop.
Hi, welcome to MapR Whiteboard Walkthrough sessions. My name is Abhinav and I'm one of the data engineers here at MapR, and the purpose of this video is to go through the comparison of Storm Trident and Spark Streaming. As you may be aware, Storm and Spark are very popular projects within the community. Storm is a stream processor that came out from Twitter in 2009, and Spark is a general purpose in-memory processing framework, both of which offer stream processing solutions.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!