Featured Author

Jim Fiori
Senior Solutions Architect, MapR

Jim has a long history in IT, involving roles in software development, performance engineering, and pre-sales engineering in both the commercial and government spaces, including the Intelligence Community. Jim has deep systems knowledge including servers, networks, storage, operating systems, and high-availability. Before joining MapR, Jim spent 20 years at Sun Microsystems and Oracle solving complex customer problems. Jim's interests are in Big Data architectures, enterprise software, performance analysis and tuning, and distributed systems.

Author's Posts

Posted on May 13, 2013 by Jim Fiori

Hadoop provides a compelling distributed platform for processing massive amounts of data in parallel using the Map/Reduce framework and the Hadoop distributed file system. A JAVA API allows developers to express the processing in terms of a map phase and a reduce phase, where both phases use key/value pairs or key/value list pairs as input/output.

Posted on February 22, 2013 by Jim Fiori
OverviewProfiling Java applications can be accomplished with many tools, such as the built-in HPROF JVM native agent library for profiling heap and CPU usage. In the world of Hadoop and MapReduce, there are a number of properties you can set to enable profiling of your mapper and reducer code.

With MapR’s enterprise-grade distribution of Hadoop, there are 3 unique features that make this task of profiling MapReduce code easier. They are:

Posted on March 14, 2013 by Jim Fiori

Running MapReduce jobs on ingested data is traditionally batch-oriented: the data must be first transferred to a local file system accessible to the Hadoop cluster, then copied into HDFS with Flume or the “hadoop fs” command. Only once the transfers are complete can MapReduce be run on the ingested files.

Blog Sign Up

Sign up and get the top posts from each week delivered to your inbox every Friday!