Monitoring Yarn

This article will help you get the Yarn plugin for sd-agent configured and returning metrics

Installing the yarn plugin package

Install the yarn plugin on Debian/Ubuntu:

sudo apt-get install sd-agent-yarn

Install the yarn plugin on RHEL/CentOS:

sudo yum install sd-agent-yarn

Read more about agent plugins.

Configuring the agent to monitor yarn

1. Configure /etc/sd-agent/conf.d/yarn.yaml

init_config:

instances:
  - resourcemanager_uri: http://localhost:8088 # or whatever your resource manager listens
    cluster_name: MyCluster # used to tag metrics, i.e. 'cluster_name:MyCluster'; default is 'default_cluster'
    collect_app_metrics: true

2. Restart the agent

sudo /etc/init.d/sd-agent restart

or

sudo systemctl restart sd-agent

Verifying the configuration
Execute info to verify the configuration with the following:

sudo /etc/init.d/sd-agent info 

or

/usr/share/python/sd-agent/agent.py info

If the agent has been configured correctly you'll see an output such as:

yarn
-----
  - instance #0 [OK]
  - Collected * metrics

You can also view the metrics returned with the following command:

sudo -u sd-agent /usr/share/python/sd-agent/agent.py check yarn

Configuring graphs

Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the yarn metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.

Screen_Shot_2018-01-18_at_11.46.12.png

Monitored metrics

MetricValues
yarn.apps.allocated_mb

The sum of memory in MB allocated to the applications running containers
mebibyte / None
Type: float
yarn.apps.allocated_vcores

The sum of virtual cores allocated to the applications running containers
core / None
Type: float
yarn.apps.elapsed_time

The elapsed time since the application started (in ms)
second / None
Type: float
yarn.apps.finished_time

The time in which the application finished (in ms since epoch)
second / None
Type: float
yarn.apps.memory_seconds

The amount of memory the application has allocated (megabyte-seconds)
second / None
Type: float
yarn.apps.progress

The progress of the application as a percent
percent / None
Type: float
yarn.apps.running_containers

The number of containers currently running for the application
None / None
Type: float
yarn.apps.started_time

The time in which application started (in ms since epoch)
second / None
Type: float
yarn.apps.vcore_seconds

The amount of CPU resources the application has allocated (virtual core-seconds)
second / None
Type: float
yarn.metrics.active_nodes

The number of active nodes
node / None
Type: float
yarn.metrics.allocated_mb

The amount of allocated memory
mebibyte / None
Type: float
yarn.metrics.allocated_virtual_cores

The number of allocated virtual cores
core / None
Type: float
yarn.metrics.apps_completed

The number of completed apps
task / None
Type: float
yarn.metrics.apps_failed

The number of failed apps
task / None
Type: float
yarn.metrics.apps_killed

The number of killed apps
task / None
Type: float
yarn.metrics.apps_pending

The number of pending apps
task / None
Type: float
yarn.metrics.apps_running

The number of running apps
task / None
Type: float
yarn.metrics.apps_submitted

The number of submitted apps
task / None
Type: float
yarn.metrics.available_mb

The amount of available memory
mebibyte / None
Type: float
yarn.metrics.available_virtual_cores

The number of available virtual cores
core / None
Type: float
yarn.metrics.containers_allocated

The number of containers allocated
None / None
Type: float
yarn.metrics.containers_pending

The number of containers pending
None / None
Type: float
yarn.metrics.containers_reserved

The number of containers reserved
None / None
Type: float
yarn.metrics.decommissioned_nodes

The number of decommissioned nodes
node / None
Type: float
yarn.metrics.lost_nodes

The number of lost nodes
node / None
Type: float
yarn.metrics.rebooted_nodes

The number of rebooted nodes
node / None
Type: float
yarn.metrics.reserved_mb

The size of reserved memory
mebibyte / None
Type: float
yarn.metrics.reserved_virtual_cores

The number of reserved virtual cores
core / None
Type: float
yarn.metrics.total_mb

The amount of total memory
mebibyte / None
Type: float
yarn.metrics.total_nodes

The total number of nodes
node / None
Type: float
yarn.metrics.total_virtual_cores

The total number of virtual cores
core / None
Type: float
yarn.metrics.unhealthy_nodes

The number of unhealthy nodes
node / None
Type: float
yarn.node.avail_memory_mb

The total amount of memory currently available on the node (in MB)
mebibyte / None
Type: float
yarn.node.available_virtual_cores

The total number of vCores available on the node
core / None
Type: float
yarn.node.last_health_update

The last time the node reported its health (in ms since epoch)
millisecond / None
Type: float
yarn.node.num_containers

The total number of containers currently running on the node
None / None
Type: float
yarn.node.used_memory_mb

The total amount of memory currently used on the node (in MB)
mebibyte / None
Type: float
yarn.node.used_virtual_cores

The total number of vCores currently used on the node
core / None
Type: float
yarn.queue.AMResourceLimit.memory

The maximum memory resources this queue can use for Application Masters (in MB)
mebibyte / None
Type: float
yarn.queue.AMResourceLimit.vCores

The maximum vCpus this queue can use for Application Masters
core / None
Type: float
yarn.queue.absoluteCapacity

The absolute capacity percentage this queue can use of entire cluster
percentage / None
Type: float
yarn.queue.absoluteMaxCapacity

The absolute maximum capacity percentage this queue can use of the entire cluster
percentage / None
Type: float
yarn.queue.absoluteUsedCapacity

The absolute used capacity percentage this queue is using of the entire cluster
percentage / None
Type: float
yarn.queue.capacity

The configured queue capacity in percentage relative to its parent queue
percentage / None
Type: float
yarn.queue.maxApplications

The maximum number of applications this queue can have
task / None
Type: float
yarn.queue.maxApplicationsPerUser

The maximum number of active applications per user this queue can have
task / None
Type: float
yarn.queue.maxCapacity

The configured maximum queue capacity in percentage relative to its parent queue
percentage / None
Type: float
yarn.queue.numActiveApplications

The number of active applications in this queue
task / None
Type: float
yarn.queue.numApplications

The number of applications currently in the queue
task / None
Type: float
yarn.queue.numContainers

The number of containers being used
None / None
Type: float
yarn.queue.numPendingApplications

The number of pending applications in this queue
task / None
Type: float
yarn.queue.resourcesUsed.memory

The total memory resources this queue is using (in MB)
mebibyte / None
Type: float
yarn.queue.resourcesUsed.vCores

The total vCpus this queue is using
core / None
Type: float
yarn.queue.root.capacity

The configured queue capacity in percentage for root queue
percentage / None
Type: float
yarn.queue.root.maxCapacity

The configured maximum queue capacity in percentage for root queue
percentage / None
Type: float
yarn.queue.root.usedCapacity

The used queue capacity in percentage for root queue
percentage / None
Type: float
yarn.queue.usedAMResource.memory

The memory resources used for Application Masters (in MB)
mebibyte / None
Type: float
yarn.queue.usedAMResource.vCores

The vCpus used for Application Masters
core / None
Type: float
yarn.queue.usedCapacity

The used queue capacity in percentage
percentage / None
Type: float
yarn.queue.userAMResourceLimit.memory

The maximum memory resources a user can use for Application Masters (in MB)
mebibyte / None
Type: float
yarn.queue.userAMResourceLimit.vCores

The maximum vCpus a user can use for Application Masters
core / None
Type: float
yarn.queue.userLimit

The user limit factor set in the configuration
None / None
Type: float
yarn.queue.userLimitFactor

The minimum user limit percent set in the configuration
None / None
Type: float
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.