Top 10 Hadoop Blogs of 2014

As we close out the year, here is a look back at our 10 most popular blogs of 2014.  Our top posts include machine learning and time series data topics, new milestones for the Apache projects Drill and Spark, and hands-on technical explanations that save you time and headaches.  

By Aaron Eng
Keeping these five steps in mind can save you a lot of headaches and avoid Java heap space errors.
 
By Sungwook Yoon
Data Scientist Sunwook Yoon talks about deep learning becausehe finds that some data scientist who know a lot about machine learning think that deep learning is the same thing as a neural network.   This video clarifies the differences.
 
By Ellen Friedman
Recording the time at which a measurement was made or an event occurred can make data much more useful for revealing valuable insights.  Time Series Databases: New Ways to Store and Access Data, published by O’Reilly, examines the fundamental concepts and practical methods for implementation of scalable, cost-effective time series databases.
 
By Michele Nemschoff
Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and there’s been plenty of hype about it in the past several months. In the latest webinar from the Data Science Central webinar series, titled “Let Spark Fly: Advantages and Use Cases for Spark on Hadoop,” we cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal.
 
By Jim Scott
There are many use cases for time series data, and they usually require handling a decent data ingest rate. Rates of more than 10,000 points per second are common and rates of 1 million points per second are not quite as common, but not outrageously high either.
 
By Bruce Penn
Having been at MapR for 2.5 years, a common question that I get from customers is, “Isn’t HDFS going to eventually catch up to MapR-FS?” The simple answer is a resounding “NO”, and the reasons lie in the foundations of the two architectures. I will first describe these differences and then outline how the implementations vastly differ in their value to customers.
 
By Dr. Kirk Borne
The Internet of Things (IoT) will be huge in several ways. The forces that are driving it and the benefits that are motivating it are increasingly numerous, as more and more organizations, industries, and technologists catch the IoT bug.
 
By Ellen Friedman
The second publication in the O’Reilly Practical Machine Learning series,  A New Look at Anomaly Detection by Ted Dunning and me we look at finding the outlier, the zebra in a herd of ponies, the fish swimming against the school of fish, the rare event.
 
By Neeraja Rentachintala
Since Apache Drill 0.4 was released in August for experimentation on the MapR Distribution, there has been tremendous interest in the customer and partner community on the promise and potential of Drill to unlock the new types of data in their Hadoop/NoSQL systems for interactive analysis throughout the organization. 
Note:  Read Apache Drill Carries Momentum into 2015 for the latest news on Drill.
 
By Dr. Kirk Borne
About 13 years ago, Doug Laney of the META Group (now Gartner) wrote an amazing report that showed both great insight and great foresight. The paper’s title was “3D Data Management: Controlling Data Volume, Velocity, and Variety.” The 3 V’s of big data were born on that day—February 6, 2001. My only not-so-serious quibble with the paper is that he should have started the title this way: “3V Data Management…” Nevertheless, from that point forward, the big data game was officially on!
no

Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams

 

 

 

Download for free