Quickstart My Spark: Kickstarting Your Spark-based Applications

Quickstart My Spark:

Always got IT
Coming after me
Custom built apps now easy for me
My Spark, my Spark
Quick start my Spark … (see full lyrics below)

For those of you who are rock fans and know the song “Kickstart my Heart” by Mötley Crüe, the above (modified) lyrics should resonate (and hopefully make you laugh).

We thought the Kickstart song by Mötley Crüe was appropriate, since everyone is excited about kickstarting their Spark-based applications these days. That’s our theme for the Quick Start Solutions we’re announcing today at Spark Summit West—you can kickstart your Spark efforts into high gear with our Spark Quick Starts. You’ll be able to develop at high speeds, use streaming data, and build applications faster.

Enter the Quick Starts

We've seen an increasing interest in Spark-based applications on Hadoop across our customer base and in the Hadoop community. Because of this, we decided to combine the advantages of the in-memory processing capabilities of the Spark engine along with the enterprise-grade capabilities of the industry-leading Hadoop distribution. These solutions deliver value quickly, as applications can be developed in a few weeks depending on the specifics of your use case.

At the core of the Quick Start Solutions is the ability to simplify the process of developing a solution that leverages Spark in conjunction with Hadoop. Each Quick Start Solution includes software, services and training/certification in one powerful package. It comes with pre-built templates that bring together best practices accumulated by world-class data scientists and data engineers from several Spark and Hadoop deployments.

  • Software: You’ll get six nodes or more of any edition of the MapR Distribution including Apache Hadoop. Support for one year is included.
  • Quick Start Professional Services: Our Professional Services team will help you get started and customize the solution to the specifics of your use case and develop an architecture document that will enable a production rollout plan.
  • Hadoop Training and Certification: Each Quick Start Solution includes Hadoop certification.

The Specifics  

The solutions we’re announcing are geared towards development of real-time big data applications on a variety of log data for security analytics, device and sensor data for time-series analytics, as well as clinical applications on human genome data.  

While the Genome Sequencing solution is vertical specific, it does have applicability to the sub-segments within Healthcare and Life Sciences, such as sequencing technology, bioinformatics, pharmaceuticals and research hospitals, among others.  

The real-time security log analytics solution is applicable to any enterprise in any vertical that’s looking to safeguard themselves from the growing occurrence of external and internal data breaches, while the time-series solution has great applicability across manufacturing, oil & gas and automotive sectors. Time-series data is growing as a result of the Internet of Things, and we believe it holds promise for a variety of different applications and use cases.  Below are some specifics for each solution:

  • The Real-time Security Log Analytics solution combines the power of the highly reliable MapR Distribution with the Apache Spark stack to support real-time analysis of large volumes of security data, which can help in early detection of advanced persistent threats and unknown threats. The solution augments existing Security Information and Event Management (SIEM) solutions by providing cost-effective storage and processing for deep analytics and by predicting anomalous behavior within the environment to identify unknown threats. In fact, Gartner predicts that by 2016, 25% of global companies will have adopted big data analytics for at least one security analytics use case.

The MapR Quick Start combines together Spark as an in-memory processing engine for faster processing of data, along with MLLib-based machine learning.  Coupled with capabilities such as NFS for high-speed data ingestion into the MapR cluster, and other enterprise grade capabilities of the MapR platform such as High Availability and Disaster Recovery, and you have a next-generation security analytics solution that provides deeper and more granular analytics.

  • The Time Series Analytics solution brings the reliable, top-ranked NoSQL database—MapR-DB—together with Apache Spark to support rapid ingestion and extraction of data along with real-time aggregation capabilities. With this solution, you can be off and running developing real-time monitoring applications and alert systems on various types of IOT data, including time-series data coming from machines, sensors and devices. According to IDC, worldwide spending on the Internet of Things is expected to reach $1.7 trillion in 2020. There are various elements from a technology stack that are part of this spend, with big data being one of the pieces as shown below

big data security

  • The Genome Sequencing solution leverages Apache Hadoop and Spark for large-scale parallel processing of genome data, providing lower latency compared to HPC and homegrown solutions. The solution reduces the latency of converting a sequenced genome to clinically actionable information, and supports flexibility and extensibility of various computational algorithms that can be utilized. Particular aligners and libraries such as ADAM and avocado can be swapped in for certain types of analysis. According to Macquarie Capital’s “Genomics 2.0: It's just the beginning 2014 report, the total sequencing technology market is ~US$1.9 billion, with next-generation sequencing technologies accounting for ~70% of the market. The Genome Sequencing solution is very much a next-generation solution, with a significant order of magnitude improvement compared to traditional technologies and alternatives.

Ready to kickstart your Spark project into high gear with the reliability of the leading Hadoop distribution in the market? Stop by our booth at Spark Summit. Show us your Kickstart My Heart rock star skills, and you could be rocking away to an Xbox 360 and Guitar Hero. Not at Spark Summit? Contact MapR to jumpstart your Spark-based Hadoop solution today.

Quick Start My Spark

Song by Mötley Crüe

When I develop
I want high speed
Data with streaming
A tool I need

My spark, my spark
Quick start my spark

Always got IT
Coming after me
Custom built apps now easy for me
My Spark, my Spark
Quick start my Spark

Ooh, are you ready guys?
Ooh, are you ready now?
Ooh, yeah
Quick start my Spark
Give it a start
Ooh, yeah, quickstart
Ooh, yeah
Quick start my spark
Great apps never stop
Ooh, yeah, quickstart

Spark with MapR
Internet of Things
High speed development
My career has wings
My Spark, my Spark
Quick start my Spark
I’ve got no trouble
slaying big data
Integrated platform, the future is mine
My spark, my spark
Quick start my spark

Ooh, are you ready guys?
Ooh, are you ready now?
Ooh, yeah
Quick start my Spark
Give it a start
Ooh, yeah, quickstart
Ooh, yeah
Quick start my Spark
Great apps never stop
Ooh, yeah, baby

no

CTA_Inside

Apache Spark is Powerful and Promising
In this white paper, Apache Spark is Powerful and Promising, you’ll learn about the important key differences and synergies between this next-generation cluster-computing power couple.

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free