This article will help you get the Mapreduce plugin for sd-agent configured and returning metrics
Installing the mapreduce plugin package
Install the mapreduce plugin on Debian/Ubuntu:
sudo apt-get install sd-agent-mapreduce
Install the mapreduce plugin on RHEL/CentOS:
sudo yum install sd-agent-mapreduce
Read more about agent plugins.
Configuring the agent to monitor mapreduce
1. Configure the instances in /etc/sd-agent/conf.d/mapreduce.yaml:
instances: # # The MapReduce check retrieves metrics from YARN's ResourceManager. This # check must be run from the Master Node and the ResourceManager URI must # be specified below. The ResourceManager URI is composed of the # ResourceManager's hostname and port. # # The ResourceManager hostname can be found in the yarn-site.xml conf file # under the property yarn.resourcemanager.address # # The ResourceManager port can be found in the yarn-site.xml conf file under # the property yarn.resourcemanager.webapp.address # - resourcemanager_uri: http://localhost:8088 # A Required friendly name for the cluster. # cluster_name: MyMapReduceCluster # Set to true to collect histograms on the elapsed time of # map and reduce tasks (default: false) # collect_task_metrics: false # Optional tags to be applied to every emitted metric. # tags: # - key:value # - instance:production
2. Optional metrics can be specfied for your counters. You can find out more about counters from the href="https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html#Job_Counters_API">MapReduce documentation:
The example configuration file that ships with the check package contains examples for optional metrics. The example config can also be viewed at the sd-agent-core-plugins GitHub repostory.
3. Restart the agent
sudo /etc/init.d/sd-agent restart
or
sudo systemctl restart sd-agent
Verifying the configuration
Execute info to verify the configuration with the following:
sudo /etc/init.d/sd-agent info
or
/usr/share/python/sd-agent/agent.py info
If the agent has been configured correctly you'll see an output such as:
mapreduce ----- - instance #0 [OK] - Collected * metrics
You can also view the metrics returned with the following command:
sudo -u sd-agent /usr/share/python/sd-agent/agent.py check mapreduce
Configuring graphs
Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the mapreduce metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.
Monitored metrics
Metric | Values |
---|---|
mapreduce.job.counter.map_counter_value Counter value of map tasks |
task / second Type: float |
mapreduce.job.counter.reduce_counter_value Counter value of reduce tasks |
task / second Type: float |
mapreduce.job.counter.total_counter_value Counter value of all tasks |
task / second Type: float |
mapreduce.job.elapsed_time.95percentile 95th percentile elapsed time since the application started |
millisecond / None Type: float |
mapreduce.job.elapsed_time.avg Average elapsed time since the application started |
millisecond / None Type: float |
mapreduce.job.elapsed_time.count Number of times the elapsed time was sampled |
None / None Type: float |
mapreduce.job.elapsed_time.max Max elapsed time since the application started |
millisecond / None Type: float |
mapreduce.job.elapsed_time.median Median elapsed time since the application started |
millisecond / None Type: float |
mapreduce.job.failed_map_attempts Number of failed map attempts |
task / second Type: float |
mapreduce.job.failed_reduce_attempts Number of failed reduce attempts |
task / second Type: float |
mapreduce.job.killed_map_attempts Number of killed map attempts |
task / second Type: float |
mapreduce.job.killed_reduce_attempts Number of killed reduce attempts |
task / second Type: float |
mapreduce.job.map.task.elapsed_time.95percentile 95th percentile of all map tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.map.task.elapsed_time.avg Average of all map tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.map.task.elapsed_time.count Number of times the map tasks elapsed time were sampled |
None / None Type: float |
mapreduce.job.map.task.elapsed_time.max Max of all map tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.map.task.elapsed_time.median Median of all map tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.maps_completed Number of completed maps |
task / second Type: float |
mapreduce.job.maps_pending Number of pending maps |
task / second Type: float |
mapreduce.job.maps_running Number of running maps |
task / second Type: float |
mapreduce.job.maps_total Total number of maps |
task / second Type: float |
mapreduce.job.new_map_attempts Number of new map attempts |
task / second Type: float |
mapreduce.job.new_reduce_attempts Number of new reduce attempts |
task / second Type: float |
mapreduce.job.reduce.task.elapsed_time.95percentile 95th percentile of all reduce tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.reduce.task.elapsed_time.avg Average of all reduce tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.reduce.task.elapsed_time.count Number of times the reduce tasks elapsed time were sampled |
None / None Type: float |
mapreduce.job.reduce.task.elapsed_time.max Max of all reduce tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.reduce.task.elapsed_time.median Median of all reduce tasks elapsed time |
millisecond / None Type: float |
mapreduce.job.reduces_completed Number of completed reduces |
task / second Type: float |
mapreduce.job.reduces_pending Number of pending reduces |
task / second Type: float |
mapreduce.job.reduces_running Number of running reduces |
task / second Type: float |
mapreduce.job.reduces_total Number of reduces |
task / second Type: float |
mapreduce.job.running_map_attempts Number of running map attempts |
task / second Type: float |
mapreduce.job.running_reduce_attempts Number of running reduce attempts |
task / second Type: float |
mapreduce.job.successful_map_attempts Number of successful map attempts |
task / second Type: float |
mapreduce.job.successful_reduce_attempts Number of successful reduce attempts |
task / second Type: float |
Comments