Jim has a long history in IT, involving roles in software development, performance engineering, and pre-sales engineering in both the commercial and government spaces, including the Intelligence Community. Jim has deep systems knowledge including servers, networks, storage, operating systems, and high-availability. Before joining MapR, Jim spent 20 years at Sun Microsystems and Oracle solving complex customer problems. Jim's interests are in Big Data architectures, enterprise software, performance analysis and tuning, and distributed systems.
Hadoop provides a compelling distributed platform for processing massive amounts of data in parallel using the Map/Reduce framework and the Hadoop distributed file system. A JAVA API allows developers to express the processing in terms of a map phase and a reduce phase, where both phases use key/value pairs or key/value list pairs as input/output.
With MapR’s enterprise-grade distribution of Hadoop, there are 3 unique features that make this task of profiling MapReduce code easier. They are:
Running MapReduce jobs on ingested data is traditionally batch-oriented: the data must be first transferred to a local file system accessible to the Hadoop cluster, then copied into HDFS with Flume or the “hadoop fs” command. Only once the transfers are complete can MapReduce be run on the ingested files.
Blog Sign Up
Sign up and get the top posts from each week delivered to your inbox every Friday!