Apache Spark Blog Posts

Posted on December 8, 2016 by Carol McDonald

This post is the second in a series where we will go over examples of how MapR data scientist Joe Blue assisted MapR customers, in this case a regional bank, to identify new data sources and apply machine learning algorithms in order to better understand their customers. In this second part, we will cover a bank customer profitability 360° example, presenting the before, during and after.

Posted on October 11, 2016 by Kirk Borne

Much has been written about the power of big data collections to enable the 360 view of our customers, our business, our employees, and our processes. When our numerous disparate heterogeneous data collections are aggregated and joined in the data lake, with appropriate data tagging and data discovery tools in place (such as Apache Drill), then we can reach for that ideal: the 360 view of our domain!

Posted on October 4, 2016 by Carol McDonald

With the rapid expansion of smart phones and other connected mobile devices, communications service providers (CSPs) need to rapidly process, store, and derive insights from the diverse volume of data travelling across their networks. Big data analytics can help CSPs improve profitability by optimizing network services/usage, enhancing customer experience, and improving security.

Posted on October 3, 2016 by Kirk Borne

One of the most significant characteristics of the evolving digital age is the convergence of technologies. That includes information management (structured and unstructured databases: e.g., NoSQL), data collection (big data), data storage (cloud and distributed data: e.g., Hadoop), data applications (analytics), knowledge discovery (data science), algorithms (machine learning), transparency (open data), computation (distributed data processing: e.g., MapReduce and Spark), sensors (Internet of Things: IoT), and API services (microservices, containerization).

Posted on August 1, 2016 by Sameer Nori

Apache Spark is becoming very popular and widely used in the big data community. There are several reasons for Spark getting such rapid traction. These include its in-memory processing capabilities, support for a wide range of engines for various use cases such as streaming, machine learning, and SQL, and the ability to develop in multiple languages such as Python and Scala.

Posted on July 29, 2016 by Ankur Desai

Oil and gas wells produce a huge amount of information. Sensors monitor things like temperature, pressure, fluid viscosity, the presence of foreign substances, and seismic activity. Sensors must be monitored in real time to optimize both performance and safety. A slight change in pressure underground may indicate a fracture that can jeopardize the whole well.

Posted on July 19, 2016 by Ellen Friedman

In January, I made predictions about six big data trends for 2016 (“What Will You Do in 2016?”). Now we’ve reached the mid-and-a-bit-more year, so it’s a good time to check them out and see how well these predictions match what has happened so far in 2016, what is surprising about that, and what’s likely to come in the second half of the year.

Posted on June 23, 2016 by Sameer Nori

Customers are flocking to Spark as their primary compute engine for big data use cases, and we received further proof of this last week when we ran an “Ask Us Anything about Spark” forum in the Converge Community. There were some great discussions that took place, where our Spark experts answered questions from customers and partners.

Posted on June 16, 2016 by Sameer Nori

There’s been a lot of buzz and high expectations in the big data community around Apache Spark 2.0 and how it will impact the development of data pipelines, streaming applications, machine learning algorithms and all of the other use cases that Apache Spark is enabling.

Posted on June 7, 2016 by Carol McDonald

Standards and incentives for the digitizing and sharing of healthcare data along with improvements and decreasing costs in storage and parallel processing on commodity hardware, are causing a big data revolution in health care with the goal of better care at lower cost.

Posted on June 6, 2016 by Balaji Mohanam

Apache Spark, a powerful general purpose engine for processing large amounts of data, has seen a rapid increase in its adoption since its release. Recognizing its impact very early on, MapR has supported and invested in Spark as part of our Hadoop distribution to enable enterprises to build applications with Spark and deploy it in production in a reliable manner.

Posted on May 16, 2016 by Ellen Friedman

Streaming data now is a big focus for many big data projects, including real time applications, so there’s a lot of interest in excellent messaging technologies such as Apache Kafka or MapR Streams, which uses the Kafka 0.9 API.

Posted on April 22, 2016 by Carol McDonald

This post will show how to integrate Apache Spark Streaming, MapR-DB, and MapR Streams for fast, event-driven applications.

Posted on April 14, 2016 by Ankur Desai

This blog post provides an introduction to the components of a typical streaming architecture and options available at each stage. The three major components, Producers, a streaming system, and consumers. The enthusiasm over real-time processing is being met with a host of technologies. Learn about some of them...

Posted on March 21, 2016 by Steve Wooledge

The number of organizations that are thinking about using Hadoop has grown astronomically over the past year. How do you know whether you’re ready to implement Hadoop, and what are the best practices?

Posted on February 15, 2016 by Ellen Friedman

Actionable insights from real time analytics – that’s a goal for many new projects being designed to make use of streaming data, and it’s no wonder so many organizations are aiming at this prize. If you can develop programs to process streaming data with near or actual real time analytics, you gain the ability to react to life as it happens.

Posted on January 14, 2016 by Michele Nemschoff

As we look back at 2015, the most popular blogs on our site are a good reflection of the 2015 trends and developments in the big data space.

Posted on January 13, 2016 by Balaji Mohanam

In this week's whiteboard walkthrough, Balaji Mohanam, Product Manager at MapR, explains the difference between Apache Spark and Apache Flink and how to make a decision which to use.

Posted on January 11, 2016 by Kirk Borne

Are people in your data analytics organization contemplating the impending data avalanche from the internet of things and thus asking this question: “Spark or Hadoop?” That’s the wrong question!

Posted on January 5, 2016 by Ellen Friedman

It’s the start of a new year -- we’re on the threshold of something new -- so let’s look forward to what you’re likely to be doing in 2016.

Posted on December 15, 2015 by Jim Scott

In this week's Whiteboard Walkthrough, Jim Scott, Director of Enterprise Strategy and Architecture at MapR, discusses a business use case that leverages the power of MapR Streams.

Posted on December 8, 2015 by Will Ochandarena

Over the last 5 years of shipping product we’ve watched our customers get enormous value out of storing and processing big data. The use cases are far and wide, from performing predictive maintenance on oil rigs to building fraud and risk models on financial transactions.

Posted on December 7, 2015 by Jim Scott

If you’re thinking about working with big data, you might be wondering which tools you should use. If you are trying to enable SQL-on-Hadoop then you might be considering the use of Apache Spark or Apache Drill.

Posted on November 30, 2015 by Jim Scott

The faster questions can be asked the faster you can get answers. Waiting for data to be shipped off of servers to a central processing platform can take time and most businesses these days want to get as close to real time as possible.

Posted on November 24, 2015 by Jim Scott

Streaming data enables businesses to respond to customers as close to real time as possible. There are many different ways to leverage a streaming platform and utilizing Spark in your streaming architecture is easier than you might think.

Posted on October 13, 2015 by Nitin Bandugula

The recent Attunity and MapR webinar ”Give your Enterprise a Spark: How to Deploy Hadoop with Spark in Production” proved to be highly interactive and engaging. As promised, Nitin and Rodan have provided follow-up answers your questions.

Posted on October 8, 2015 by Jim Scott

At Strata London in 2015, someone said to me, “Spark is like a fighter jet that you have to build yourself. Once you have it built, though, you have a fighter jet. Pretty awesome. Now you have to learn to fly it.”

Posted on October 6, 2015 by Jim Scott

Stream processing is a power that has been added alongside Spark Core and its original design goal of rapid in-memory data processing.

Posted on September 28, 2015 by Jim Scott

Apache Spark is a top-level project of the Apache Software Foundation, designed to be used with a range of programming languages on a variety of architectures.

Posted on September 23, 2015 by Jim Scott

In this blog post, I’ll talk about the relationship between Spark and Hadoop, what Hadoop gives Spark, and what Spark gives Hadoop.

Posted on September 22, 2015 by Michele Nemschoff

Australian shoppers are some of the most digitally influenced in the world; a majority of Australians go online to research a product before buying it, according to a 2015 report by Deloitte.

Posted on September 21, 2015 by Jim Scott

Recently, a new name has entered many of the conversations about big data. Some people see the popular newcomer Apache Spark™ as a more accessible and more powerful replacement for Hadoop, the original technology of choice for big data.

Posted on September 15, 2015 by Michele Nemschoff

It's an exciting time for those in pharmaceutical research these days, given that research organizations can now leverage big data to improve their business.

Posted on September 14, 2015 by Sean Suchter

As more organizations begin to deploy Spark in their production clusters, the need for fine-grained monitoring tools becomes paramount.

Posted on September 10, 2015 by Steve Wooledge

In this week's Whiteboard Walkthrough, Steve Wooledge, VP of Industry Solutions at MapR, talks about an Apache Sark + Hadoop use case for drug discovery that one of our customers is currently running in production.

Posted on September 8, 2015 by Michele Nemschoff

The explosion of data from new devices and technologies has forced the telecommunications industry to completely change the way they handle big data. Their traditional storage and analytics solutions cannot adequately manage the expanding, diverse volume of data generated today.

Posted on August 21, 2015 by Jim Scott

Apache Hadoop is revolutionizing big data in more than one way. While the Hadoop platform introduced reliable distributed storage and processing, various packages such as Spark on top of Hadoop make it possible to build applications and analyze data much faster. Here are some cool ways the Hadoop stack is being used right now.

Posted on August 11, 2015 by Nitin Bandugula

Apache Spark on Hadoop is great for processing large amounts of data quickly. The story gets even better when you get into the realm of real time applications.

Posted on July 15, 2015 by Anil Gadre

You are probably all somewhere on the Spark journey to production scale—you're either at Spark Summit to learn, to start doing something with Spark, or perhaps you have mission-critical applications already running in your enterprise. On this journey, there's a lot to think about—mostly about your application—but you also need to figure out how to actually get Spark into production scale as more and more groups will want the power of the results and the value of using Spark in mission-critical, operational deployments.

Posted on June 23, 2015 by Nitin Bandugula

In this blog, I’d like to talk about the differences between Apache Spark and MapReduce, why it’s easier to develop on Spark, and the top five use cases.

Posted on June 15, 2015 by Sameer Nori

We thought the Kickstart song by Mötley Crüe was appropriate, since everyone is excited about kickstarting their Spark-based applications these days. That’s our theme for the Quick Start Solutions we’re announcing today at Spark Summit West—you can kickstart your Spark efforts into high gear with our Spark Quick Starts. You’ll be able to develop at high speeds, use streaming data, and build applications faster.

Posted on May 15, 2015 by Arsalan Tavakoli-Shiraji

Apache Spark recently celebrated its five-year anniversary as an open source project. While we are always humbled and excited by the open source success of Spark, it gives us far greater pleasure in knowing that there are more and more organizations this year that are deploying Spark into production business applications.

Posted on December 31, 2014 by Karen Whipple

As we close out the year, here is a look back at our 10 most popular blogs of 2014.  Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.  

Posted on December 16, 2014 by Nitin Bandugula

As one of the most popular tools in the Apache Hadoop ecosystem, there’s been a lot of noise made about Apache Spark – and for good reason. It complements the existing Hadoop ecosystem by adding easy-to-use APIs and data-pipelining capabilities to Hadoop data, and the project support continues to grow. Since its launch in 2010, Spark has seen over 400 contributors from more than 50 different companies.

Posted on October 8, 2014 by Nitin Bandugula

In this blog series, we’re showcasing the top 10 reasons customers choose the MapR Distribution for Hadoop to optimize their data-driven strategies. Reason #8: MapR provides Unbiased Open Source supporting 20+ OSS projects.

Posted on July 18, 2014 by Nitin Bandugula

M.C. Srivas, CTO and Co-Founder of MapR Technologies, spoke recently at Spark Summit 2014 on “Why Spark on Hadoop Matters.” Spark, with an in-memory processing framework, provides a complimentary full stack on Hadoop, and this integration is showing tremendous promise for MapR customers.

Posted on April 14, 2014 by Karen Whipple
We just wrapped up a great quarter for MapR! We introduced our free Sandbox for Hadoop, achieved the highest ranking for Current Offering in a Big Data Hadoop Solutions report by Forrester, and announced the MapR Distribution for Hadoop with YARN and HP Vertica. Read about our latest announcements, top blog posts, webinars, white papers and more in this information-packed newsletter.
Posted on April 10, 2014 by Arsalan Tavakoli-Shiraji
Today, MapR announced that it will distribute and support the Apache Spark platform as part of the MapR Distribution for Hadoop in partnership with Databricks. We’re thrilled to start on this journey with MapR for a multitude of reasons.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free