Whiteboard Walkthrough Videos Blog Posts

Posted on November 30, 2016 by Rachel Silver

In this week’s Whiteboard Walkthrough, Rachel Silver, Ecosystem Product Manager at MapR, talks about MapR Ecosystem Packs or MEPs that give you a convenient way to upgrade open source ecosystem components without having to upgrade the core MapR platform. The open source components in MEPs have been tested to be functionally interoperable within the MEP so that you can spend more time processing/analyzing data and less time troubleshooting your stack.

Posted on November 2, 2016 by Neeraja Rentachintala

In this week's Whiteboard Walkthrough video, Neeraja Rentachintala, Senior Director of Product Management at MapR Technologies, explains how Apache Drill optimization achieves interactive performance for low latency SQL queries on very large data sets when working with familiar BI tools such as Tableau, Microstrategy or Qlikview and includes techniques used for successful optimization using Drill in production. Neeraja describes Drill optimization capabilities based on Apache Calcite that include projection pruning, filter push down, partition pruning, cost-based optimization and meta-data caching.

Posted on October 26, 2016 by Jorge Geronimo

In this week's Whiteboard Walkthrough Jorge Geronimo, Solutions Architect at MapR, explains how with a single line of code you can create a replica of a MapR data stream within the same cluster or to another cluster even in another part of the world. Jorge also describes multi master replication for streaming data and how MapR Streams' unique capability for geo-distributed replication with preserved offsets offers advantages for working with streaming data.

Posted on October 12, 2016 by Parth Chandra

In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet.

Posted on October 6, 2016 by Nick Amato

In this week's whiteboard walkthrough, Nick Amato, Director Technical Marketing at MapR, explains the advantages of a publish-subscribe model for real-time data streams.

Posted on December 3, 2015 by Santosh Marella

In this week's Whiteboard Walkthrough, Santosh Marella, committer on the Apache Myriad project, explains how Apache Myriad enables fine-grained scaling in Mesos environments alongside YARN, the resource management framework for Apache Hadoop.

Posted on October 14, 2015 by Bharat Baddepudi

In this week's Whiteboard Walkthrough, Bharat Baddepudi, engineer on the MapR-DB team, explains how documents in MapR-DB are inserted and updated.

Posted on September 29, 2015 by M.C. Srivas

In this week's Whiteboard Walkthrough, MC Srivas, MapR CTO and Co-Founder, explains the innovation and vision behind MapR-DB and how project Kudu stacks up to the MapR Data Platform.

Posted on July 15, 2015 by Ted Dunning

In this week's Whiteboard Walkthrough, Ted Dunning, Chief Application Architect at MapR, talks about the architectural differences between HDFS and MapR-FS that boil down to three numbers.

Posted on July 1, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, gives you an introduction to the Zeta Architecture, a high-level enterprise architectural construct which enables simplified business processes and defines a scalable way to increase the speed of integrating data into the business.

Posted on June 17, 2015 by Anoop Dawar

In this week's Whiteboard Walkthrough, Anoop Dawar, Senior Product Director at MapR, shows you the basics of Apache Spark and how it is different from MapReduce.

Posted on May 13, 2015 by Tomer Shiran

In this week's Whiteboard Walkthrough, Tomer Shiran, PMC member and Apache Drill committer, walks you through the history of the non-relational datastore and why Apache Drill is so important for this type of technology.

Posted on April 29, 2015 by Ted Dunning

In this week's Whiteboard Walkthrough, Ted Dunning, Chief Application Architect at MapR, gets you up to speed on the t-digest, an algorithm you can add to any anomaly detector to set the number of alarms that you get as a percentage of the total samples.

Posted on February 11, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, explains the differences between Apache Mesos and YARN, and why one may or may not be better in global resource management than the other.

Posted on February 4, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, talks about the implications of append-only file systems and the impact they have on downstream projects in the Hadoop ecosystem. He starts off by demonstrating this concept using HBase, and how it has forced HBase to have to consider certain implications on the functionality of a real-time capable data store.

Posted on January 21, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, walks you through HBase key design with OpenTSDB. 

One of the important things to keep in mind with HBase is that it is a linearly-scaling, column-oriented key value store. Now in order to get linearly-scalable functionality out of HBase, you have to be very cognizant of the key design. This means you don't want to create what's called hot spots, and you want to prevent things like sequential writes from occurring. So what I've done is I've pre-drawn this diagram for you to show you that if you were to write sequentially, the keys, what happens in HBase is that when you're writing keys one through five, they're all going to land on the first server.

Posted on December 17, 2014 by Abizer Adenwala

In this week's Whiteboard Walkthrough, Abizer Adenwala, Technical Support Engineer at MapR, walks you through what a storage pool is, why disks are striped, reasons disk would be marked as failed, what happens when a disk is marked failed, what to watch out for before reformatting/re-adding disk back, and what is the best path to recover from disk failure.

Posted on December 10, 2014 by James Casaletto

In this week's Whiteboard Walkthrough, James Casaletto walks you through how to configure the network for the MapR Hadoop Sandbox. Whether you use VirtualBox, VMware Fusion, VMware Player, or pretty much any hypervisor on your laptop to support your MapR Sandbox, you'll need to configure the network. There's essentially three different settings that you can use to configure the network for your Sandbox. One is NAT, one is host-only, and one is bridged.

Posted on November 26, 2014 by Jon Allen

Welcome to the MapR Whiteboard Walkthrough. My name is John and I'm the author of the cluster administration course that you'll find at training.mapr.com. I'm here to talk to you a about the CLDB, or the container location database. Over the next few minutes, I'll give you a quick definition of the CLDB, an overview, and talk a little bit more about what's inside the CLDB. At the end of this video you should be able to define the function of the CLDB in a MapR cluster and also describe how it differs from the namenode in standard Hadoop.

Posted on October 30, 2014 by Abhinav Chawade

Hi, welcome to MapR Whiteboard Walkthrough sessions. My name is Abhinav and I'm one of the data engineers here at MapR, and the purpose of this video is to go through the comparison of Storm Trident and Spark Streaming. As you may be aware, Storm and Spark are very popular projects within the community. Storm is a stream processor that came out from Twitter in 2009, and Spark is a general purpose in-memory processing framework, both of which offer stream processing solutions.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free