This article will help you get the Yarn plugin for sd-agent configured and returning metrics
Installing the yarn plugin package
Install the yarn plugin on Debian/Ubuntu:
sudo apt-get install sd-agent-yarn
Install the yarn plugin on RHEL/CentOS:
sudo yum install sd-agent-yarn
Read more about agent plugins.
Configuring the agent to monitor yarn
1. Configure /etc/sd-agent/conf.d/yarn.yaml
init_config: instances: - resourcemanager_uri: http://localhost:8088 # or whatever your resource manager listens cluster_name: MyCluster # used to tag metrics, i.e. 'cluster_name:MyCluster'; default is 'default_cluster' collect_app_metrics: true
2. Restart the agent
sudo /etc/init.d/sd-agent restart
or
sudo systemctl restart sd-agent
Verifying the configuration
Execute info to verify the configuration with the following:
sudo /etc/init.d/sd-agent info
or
/usr/share/python/sd-agent/agent.py info
If the agent has been configured correctly you'll see an output such as:
yarn ----- - instance #0 [OK] - Collected * metrics
You can also view the metrics returned with the following command:
sudo -u sd-agent /usr/share/python/sd-agent/agent.py check yarn
Configuring graphs
Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the yarn metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.
Monitored metrics
Metric | Values |
---|---|
yarn.apps.allocated_mb The sum of memory in MB allocated to the applications running containers |
mebibyte / None Type: float |
yarn.apps.allocated_vcores The sum of virtual cores allocated to the applications running containers |
core / None Type: float |
yarn.apps.elapsed_time The elapsed time since the application started (in ms) |
second / None Type: float |
yarn.apps.finished_time The time in which the application finished (in ms since epoch) |
second / None Type: float |
yarn.apps.memory_seconds The amount of memory the application has allocated (megabyte-seconds) |
second / None Type: float |
yarn.apps.progress The progress of the application as a percent |
percent / None Type: float |
yarn.apps.running_containers The number of containers currently running for the application |
None / None Type: float |
yarn.apps.started_time The time in which application started (in ms since epoch) |
second / None Type: float |
yarn.apps.vcore_seconds The amount of CPU resources the application has allocated (virtual core-seconds) |
second / None Type: float |
yarn.metrics.active_nodes The number of active nodes |
node / None Type: float |
yarn.metrics.allocated_mb The amount of allocated memory |
mebibyte / None Type: float |
yarn.metrics.allocated_virtual_cores The number of allocated virtual cores |
core / None Type: float |
yarn.metrics.apps_completed The number of completed apps |
task / None Type: float |
yarn.metrics.apps_failed The number of failed apps |
task / None Type: float |
yarn.metrics.apps_killed The number of killed apps |
task / None Type: float |
yarn.metrics.apps_pending The number of pending apps |
task / None Type: float |
yarn.metrics.apps_running The number of running apps |
task / None Type: float |
yarn.metrics.apps_submitted The number of submitted apps |
task / None Type: float |
yarn.metrics.available_mb The amount of available memory |
mebibyte / None Type: float |
yarn.metrics.available_virtual_cores The number of available virtual cores |
core / None Type: float |
yarn.metrics.containers_allocated The number of containers allocated |
None / None Type: float |
yarn.metrics.containers_pending The number of containers pending |
None / None Type: float |
yarn.metrics.containers_reserved The number of containers reserved |
None / None Type: float |
yarn.metrics.decommissioned_nodes The number of decommissioned nodes |
node / None Type: float |
yarn.metrics.lost_nodes The number of lost nodes |
node / None Type: float |
yarn.metrics.rebooted_nodes The number of rebooted nodes |
node / None Type: float |
yarn.metrics.reserved_mb The size of reserved memory |
mebibyte / None Type: float |
yarn.metrics.reserved_virtual_cores The number of reserved virtual cores |
core / None Type: float |
yarn.metrics.total_mb The amount of total memory |
mebibyte / None Type: float |
yarn.metrics.total_nodes The total number of nodes |
node / None Type: float |
yarn.metrics.total_virtual_cores The total number of virtual cores |
core / None Type: float |
yarn.metrics.unhealthy_nodes The number of unhealthy nodes |
node / None Type: float |
yarn.node.avail_memory_mb The total amount of memory currently available on the node (in MB) |
mebibyte / None Type: float |
yarn.node.available_virtual_cores The total number of vCores available on the node |
core / None Type: float |
yarn.node.last_health_update The last time the node reported its health (in ms since epoch) |
millisecond / None Type: float |
yarn.node.num_containers The total number of containers currently running on the node |
None / None Type: float |
yarn.node.used_memory_mb The total amount of memory currently used on the node (in MB) |
mebibyte / None Type: float |
yarn.node.used_virtual_cores The total number of vCores currently used on the node |
core / None Type: float |
yarn.queue.AMResourceLimit.memory The maximum memory resources this queue can use for Application Masters (in MB) |
mebibyte / None Type: float |
yarn.queue.AMResourceLimit.vCores The maximum vCpus this queue can use for Application Masters |
core / None Type: float |
yarn.queue.absoluteCapacity The absolute capacity percentage this queue can use of entire cluster |
percentage / None Type: float |
yarn.queue.absoluteMaxCapacity The absolute maximum capacity percentage this queue can use of the entire cluster |
percentage / None Type: float |
yarn.queue.absoluteUsedCapacity The absolute used capacity percentage this queue is using of the entire cluster |
percentage / None Type: float |
yarn.queue.capacity The configured queue capacity in percentage relative to its parent queue |
percentage / None Type: float |
yarn.queue.maxApplications The maximum number of applications this queue can have |
task / None Type: float |
yarn.queue.maxApplicationsPerUser The maximum number of active applications per user this queue can have |
task / None Type: float |
yarn.queue.maxCapacity The configured maximum queue capacity in percentage relative to its parent queue |
percentage / None Type: float |
yarn.queue.numActiveApplications The number of active applications in this queue |
task / None Type: float |
yarn.queue.numApplications The number of applications currently in the queue |
task / None Type: float |
yarn.queue.numContainers The number of containers being used |
None / None Type: float |
yarn.queue.numPendingApplications The number of pending applications in this queue |
task / None Type: float |
yarn.queue.resourcesUsed.memory The total memory resources this queue is using (in MB) |
mebibyte / None Type: float |
yarn.queue.resourcesUsed.vCores The total vCpus this queue is using |
core / None Type: float |
yarn.queue.root.capacity The configured queue capacity in percentage for root queue |
percentage / None Type: float |
yarn.queue.root.maxCapacity The configured maximum queue capacity in percentage for root queue |
percentage / None Type: float |
yarn.queue.root.usedCapacity The used queue capacity in percentage for root queue |
percentage / None Type: float |
yarn.queue.usedAMResource.memory The memory resources used for Application Masters (in MB) |
mebibyte / None Type: float |
yarn.queue.usedAMResource.vCores The vCpus used for Application Masters |
core / None Type: float |
yarn.queue.usedCapacity The used queue capacity in percentage |
percentage / None Type: float |
yarn.queue.userAMResourceLimit.memory The maximum memory resources a user can use for Application Masters (in MB) |
mebibyte / None Type: float |
yarn.queue.userAMResourceLimit.vCores The maximum vCpus a user can use for Application Masters |
core / None Type: float |
yarn.queue.userLimit The user limit factor set in the configuration |
None / None Type: float |
yarn.queue.userLimitFactor The minimum user limit percent set in the configuration |
None / None Type: float |
Comments