Managing, Monitoring, and Testing MapReduce Jobs: How to Work with Counters

In this post, we detail how to work with counters to track MapReduce job progress. We will look at how to work with Hadoop’s built-in counters, as well as custom counters.

In part 2, we will discuss how to use the MapR Control System (MCS) to monitor jobs. We’ll also detail how to manage and display jobs, history, and logs using the command line interface.

Note: The material from this blog post is from one of our free on-demand training courses, Developing Hadoop Applications.

Working with Counters

Counters are used to determine if and how often a particular event occurred during a job execution.

There are 4 categories of counters in Hadoop: file system, job, framework, and custom.

You can use the built-in counters to validate that:

  • The correct number of bytes was read and written
  • The correct number of tasks was launched and successfully ran
  • The amount of CPU and memory consumed is appropriate for your job and cluster nodes
  • The correct number of records was read and written

You can also configure custom counters that are specific to your application.

Hadoop file system counters

The FILE_BYTES_WRITTEN counter is incremented for each byte written to the local file system. These writes occur during the map phase when the mappers write their intermediate results to the local file system. They also occur during the shuffle phase when the reducers spill intermediate results to their local disks while sorting.

The off-the-shelf Hadoop counters that correspond to MAPRFS_BYTES_READ and MAPRFS_BYTES_WRITTEN are HDFS_BYTES_READ and HDFS_BYTES_WRITTEN.

The amount of data read and written will depend on the compression algorithm you use, if any.

Hadoop job counters

The table above describes the counters that apply to Hadoop jobs.

The DATA_LOCAL_MAPS indicates how many map tasks executed on local file systems. Optimally, all the map tasks will execute on local data to exploit locality of reference, but this isn’t always possible.

The FALLOW_SLOTS_MILLIS_MAPS indicates how much time map tasks wait in the queue after the slots are reserved but before the map tasks execute. A high number indicates a possible mismatch between the number of slots configured for a task tracker and how many resources are actually available.

The SLOTS_MILLIS_* counters show how much time in milliseconds expired for the tasks. This value indicates wall clock time for the map and reduce tasks.

The TOTAL_LAUNCHED_MAPS counter defines how many map tasks were launched for the job, including failed tasks. Optimally, this number is the same as the number of splits for the job.

MapReduce Framework Counters

The COMBINE_* counters show how many records were read and written by the optional combiner. If you don’t specify a combiner, these counters will be 0.

The CPU statistics are gathered from /proc/cpuinfo and indicate how much total time was spent executing map and reduce tasks for a job.

The garbage collection counter is reported from GarbageCollectorMXBean.getCollectionTime().

The MAP*RECORDS are incremented for every successful record read and written by the mappers. Records that the map tasks failed to read or write are not included in these counters.

The PHYSICAL_MEMORY_BYTES statistics are gathered from /proc/meminfo and indicate how much RAM (not including swap space) was consumed by all the tasks.

Mapreduce framework counter

The REDUCE_INPUT_GROUPS counter is incremented for every unique key that the reducers process. This value should be equal to the total number of different keys in the intermediate results from the mappers.

The REDUCE*RECORDS counters indicate how many records were successfully read and written by the reducers. The input record counter should be equal to the MAP_OUTPUT_RECORDS counter.

The REDUCE_SHUFFLE_BYTES counter indicates how many bytes the intermediate results from the mappers were transferred to the reducers. Higher numbers here will make the job go slower as the shuffle process is the primary network consumer in the MapReduce process.

The SPILLED_RECORDS counter indicates how much data the map and reduce tasks wrote (spilled) to disk when processing the records. Higher numbers here will make the job go slower.

The SPLIT_RAW_BYTES counter only increments for metadata in splits, not the actual data itself.

The VIRTUAL_MEMORY_BYTES counter shows how much physical memory plus swap space is consumed by all tasks in a job. Contrast this counter with PHYSICAL_MEMORY_BYTES. The difference between these two counters indicates how much swap space is used for a job.

Custom Counters

MapReduce allows you to define your own custom counters. Custom counters are useful for counting specific records such as Bad Records, as the framework counts only total records. Custom counters can also be used to count outliers such as example maximum and minimum values, and for summations.

Counters may be incremented (or decremented) globally in all the mappers and reducers of a given job. Alternatively, your program logic may manipulate them in either just the mappers or just the reducers. In either case, they are referenced using a group name and a counter name, and may be dereferenced in the driver and reported in the job history.

All the counters, whether custom or framework, are stored in the JobTracker JVM memory, so there’s a practical limit to the number of counters you should use. The rule of thumb is to use less than 100, but this will vary based on physical memory capacity.

Thanks for reading this post about working with counters. Stay tuned for next week’s post about how to manage jobs and tasks. As always, if you have any questions please submit those in the comments section below.


Streaming Data Architecture:

New Designs Using Apache Kafka and MapR Streams




Download for free