This article will help you get the Spark plugin for sd-agent configured and returning metrics
Installing the spark plugin package
Install the spark plugin on Debian/Ubuntu:
sudo apt-get install sd-agent-spark
Install the spark plugin on RHEL/CentOS:
sudo yum install sd-agent-spark
Read more about agent plugins.
Configuring the agent to monitor spark
The spark check can be used on your Mesos master, YARN ResourceManager, or your Spark master.
1. Configure the check in /etc/sd-agent/conf.d/spark.yaml
init_config: instances: - spark_url: http://localhost:8088 # Spark master web UI # spark_url: http://:5050 # Mesos master web UI # spark_url: http://:8088 # YARN ResourceManager address spark_cluster_mode: spark_standalone_mode # default is spark_yarn_mode # spark_cluster_mode: spark_mesos_mode # spark_cluster_mode: spark_yarn_mode cluster_name: # required; adds a tag 'cluster_name:' to all metrics # spark_pre_20_mode: true # if you use Standalone Spark < v2.0 # spark_proxy_enabled: true # if you have enabled the spark UI proxy
Ensure that you set your spark_url and spark_cluster_mode to match your deployment.
2. Restart the agent
sudo /etc/init.d/sd-agent restart
sudo systemctl restart sd-agent
Verifying the configuration
Execute info to verify the configuration with the following:
sudo /etc/init.d/sd-agent info
/usr/share/python/sd-agent/ info
If the agent has been configured correctly you'll see an output such as:
spark ----- - instance #0 [OK] - Collected * metrics
You can also view the metrics returned with the following command:
sudo -u sd-agent /usr/share/python/sd-agent/ check spark
Configuring graphs
Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the spark metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.
Monitored metrics
Metric | Values |
spark.driver.active_tasks Number of active tasks in the driver |
task / second Type: float |
spark.driver.completed_tasks Number of completed tasks in the driver |
task / second Type: float |
spark.driver.disk_used Amount of disk used in the driver |
byte / second Type: float |
spark.driver.failed_tasks Number of failed tasks in the driver |
task / second Type: float |
spark.driver.max_memory Maximum memory used in the driver |
byte / second Type: float |
spark.driver.memory_used Amount of memory used in the driver |
byte / second Type: float |
spark.driver.rdd_blocks Number of RDD blocks in the driver |
block / second Type: float |
spark.driver.total_duration Fraction of time (ms/s) spent by the driver |
fraction / None Type: float |
spark.driver.total_input_bytes Number of input bytes in the driver |
byte / second Type: float |
spark.driver.total_shuffle_read Number of bytes read during a shuffle in the driver |
byte / second Type: float |
spark.driver.total_shuffle_write Number of shuffled bytes in the driver |
byte / second Type: float |
spark.driver.total_tasks Number of total tasks in the driver |
task / second Type: float |
spark.executor.active_tasks Number of active tasks in the application's executors |
task / second Type: float |
spark.executor.completed_tasks Number of completed tasks in the application's executors |
task / second Type: float |
spark.executor.disk_used Amount of disk space used by persisted RDDs in the application's executors |
byte / second Type: float |
spark.executor.failed_tasks Number of failed tasks in the application's executors |
task / second Type: float |
spark.executor.memory_used Amount of memory used for cached RDDs in the application's executors |
byte / second Type: float |
spark.executor.rdd_blocks Number of persisted RDD blocks in the application's executors |
block / second Type: float |
spark.executor.total_duration Fraction of time (ms/s) spent by the application's executors executing tasks |
fraction / None Type: float |
spark.executor.total_input_bytes Total number of input bytes in the application's executors |
byte / second Type: float |
spark.executor.total_shuffle_read Total number of bytes read during a shuffle in the application's executors |
byte / second Type: float |
spark.executor.total_shuffle_write Total number of shuffled bytes in the application's executors |
byte / second Type: float |
spark.executor.total_tasks Total number of tasks in the application's executors |
task / second Type: float |
spark.executor_memory Maximum memory available for caching RDD blocks in the application's executors |
byte / second Type: float |
spark.job.num_active_stages Number of active stages in the application |
stage / second Type: float |
spark.job.num_active_tasks Number of active tasks in the application |
task / second Type: float |
spark.job.num_completed_stages Number of completed stages in the application |
stage / second Type: float |
spark.job.num_failed_stages Number of failed stages in the application |
stage / second Type: float |
spark.job.num_failed_tasks Number of failed tasks in the application |
task / second Type: float |
spark.job.num_skipped_stages Number of skipped stages in the application |
stage / second Type: float |
spark.job.num_skipped_tasks Number of skipped tasks in the application |
task / second Type: float |
spark.job.num_tasks Number of tasks in the application |
task / second Type: float |
spark.rdd.disk_used Amount of disk space used by persisted RDDs in the application |
byte / second Type: float |
spark.rdd.memory_used Amount of memory used in the application's persisted RDDs |
byte / second Type: float |
spark.rdd.num_cached_partitions Number of in-memory cached RDD partitions in the application |
None / second Type: float |
spark.rdd.num_partitions Number of persisted RDD partitions in the application |
None / second Type: float |
spark.stage.disk_bytes_spilled Max size on disk of the spilled bytes in the application's stages |
byte / second Type: float |
spark.stage.executor_run_time Fraction of time (ms/s) spent by the executor in the application's stages |
fraction / None Type: float |
spark.stage.input_bytes Input bytes in the application's stages |
byte / second Type: float |
spark.stage.input_records Input records in the application's stages |
record / second Type: float |
spark.stage.memory_bytes_spilled Number of bytes spilled to disk in the application's stages |
byte / second Type: float |
spark.stage.num_active_tasks Number of active tasks in the application's stages |
task / second Type: float |
spark.stage.num_complete_tasks Number of complete tasks in the application's stages |
task / second Type: float |
spark.stage.num_failed_tasks Number of failed tasks in the application's stages |
task / second Type: float |
spark.stage.output_bytes Output bytes in the application's stages |
byte / second Type: float |
spark.stage.output_records Output records in the application's stages |
record / second Type: float |
spark.stage.shuffle_read_bytes Number of bytes read during a shuffle in the application's stages |
byte / second Type: float |
spark.stage.shuffle_read_records Number of records read during a shuffle in the application's stages |
record / second Type: float |
spark.stage.shuffle_write_bytes Number of shuffled bytes in the application's stages |
byte / second Type: float |
spark.stage.shuffle_write_records Number of shuffled records in the application's stages |
record / second Type: float |