Monitoring MapReduce

This article will help you get the Mapreduce plugin for sd-agent configured and returning metrics

Installing the mapreduce plugin package

Install the mapreduce plugin on Debian/Ubuntu:

sudo apt-get install sd-agent-mapreduce

Install the mapreduce plugin on RHEL/CentOS:

sudo yum install sd-agent-mapreduce

Read more about agent plugins.

Configuring the agent to monitor mapreduce

1. Configure the instances in /etc/sd-agent/conf.d/mapreduce.yaml:

instances:
  #
  # The MapReduce check retrieves metrics from YARN's ResourceManager. This
  # check must be run from the Master Node and the ResourceManager URI must
  # be specified below. The ResourceManager URI is composed of the
  # ResourceManager's hostname and port.
  #
  # The ResourceManager hostname can be found in the yarn-site.xml conf file
  # under the property yarn.resourcemanager.address
  #
  # The ResourceManager port can be found in the yarn-site.xml conf file under
  # the property yarn.resourcemanager.webapp.address
  #
  - resourcemanager_uri: http://localhost:8088

    # A Required friendly name for the cluster.
    # cluster_name: MyMapReduceCluster

    # Set to true to collect histograms on the elapsed time of
    # map and reduce tasks (default: false)
    # collect_task_metrics: false

    # Optional tags to be applied to every emitted metric.
    # tags:
    #   - key:value
    #   - instance:production

2. Optional metrics can be specfied for your counters. You can find out more about counters from the href="https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html#Job_Counters_API">MapReduce documentation:

The example configuration file that ships with the check package contains examples for optional metrics. The example config can also be viewed at the sd-agent-core-plugins GitHub repostory.

3. Restart the agent

sudo /etc/init.d/sd-agent restart

or

sudo systemctl restart sd-agent

Verifying the configuration
Execute info to verify the configuration with the following:

sudo /etc/init.d/sd-agent info 

or

/usr/share/python/sd-agent/agent.py info

If the agent has been configured correctly you'll see an output such as:

mapreduce
-----
  - instance #0 [OK]
  - Collected * metrics

You can also view the metrics returned with the following command:

sudo -u sd-agent /usr/share/python/sd-agent/agent.py check mapreduce

Configuring graphs

Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the mapreduce metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.

Screen_Shot_2018-01-18_at_11.46.12.png

Monitored metrics

Metric Values
mapreduce.job.counter.map_counter_value

Counter value of map tasks
task / second
Type: float
mapreduce.job.counter.reduce_counter_value

Counter value of reduce tasks
task / second
Type: float
mapreduce.job.counter.total_counter_value

Counter value of all tasks
task / second
Type: float
mapreduce.job.elapsed_time.95percentile

95th percentile elapsed time since the application started
millisecond / None
Type: float
mapreduce.job.elapsed_time.avg

Average elapsed time since the application started
millisecond / None
Type: float
mapreduce.job.elapsed_time.count

Number of times the elapsed time was sampled
None / None
Type: float
mapreduce.job.elapsed_time.max

Max elapsed time since the application started
millisecond / None
Type: float
mapreduce.job.elapsed_time.median

Median elapsed time since the application started
millisecond / None
Type: float
mapreduce.job.failed_map_attempts

Number of failed map attempts
task / second
Type: float
mapreduce.job.failed_reduce_attempts

Number of failed reduce attempts
task / second
Type: float
mapreduce.job.killed_map_attempts

Number of killed map attempts
task / second
Type: float
mapreduce.job.killed_reduce_attempts

Number of killed reduce attempts
task / second
Type: float
mapreduce.job.map.task.elapsed_time.95percentile

95th percentile of all map tasks elapsed time
millisecond / None
Type: float
mapreduce.job.map.task.elapsed_time.avg

Average of all map tasks elapsed time
millisecond / None
Type: float
mapreduce.job.map.task.elapsed_time.count

Number of times the map tasks elapsed time were sampled
None / None
Type: float
mapreduce.job.map.task.elapsed_time.max

Max of all map tasks elapsed time
millisecond / None
Type: float
mapreduce.job.map.task.elapsed_time.median

Median of all map tasks elapsed time
millisecond / None
Type: float
mapreduce.job.maps_completed

Number of completed maps
task / second
Type: float
mapreduce.job.maps_pending

Number of pending maps
task / second
Type: float
mapreduce.job.maps_running

Number of running maps
task / second
Type: float
mapreduce.job.maps_total

Total number of maps
task / second
Type: float
mapreduce.job.new_map_attempts

Number of new map attempts
task / second
Type: float
mapreduce.job.new_reduce_attempts

Number of new reduce attempts
task / second
Type: float
mapreduce.job.reduce.task.elapsed_time.95percentile

95th percentile of all reduce tasks elapsed time
millisecond / None
Type: float
mapreduce.job.reduce.task.elapsed_time.avg

Average of all reduce tasks elapsed time
millisecond / None
Type: float
mapreduce.job.reduce.task.elapsed_time.count

Number of times the reduce tasks elapsed time were sampled
None / None
Type: float
mapreduce.job.reduce.task.elapsed_time.max

Max of all reduce tasks elapsed time
millisecond / None
Type: float
mapreduce.job.reduce.task.elapsed_time.median

Median of all reduce tasks elapsed time
millisecond / None
Type: float
mapreduce.job.reduces_completed

Number of completed reduces
task / second
Type: float
mapreduce.job.reduces_pending

Number of pending reduces
task / second
Type: float
mapreduce.job.reduces_running

Number of running reduces
task / second
Type: float
mapreduce.job.reduces_total

Number of reduces
task / second
Type: float
mapreduce.job.running_map_attempts

Number of running map attempts
task / second
Type: float
mapreduce.job.running_reduce_attempts

Number of running reduce attempts
task / second
Type: float
mapreduce.job.successful_map_attempts

Number of successful map attempts
task / second
Type: float
mapreduce.job.successful_reduce_attempts

Number of successful reduce attempts
task / second
Type: float
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.