Monitoring Spark

This article will help you get the Spark plugin for sd-agent configured and returning metrics

Installing the spark plugin package

Install the spark plugin on Debian/Ubuntu:

sudo apt-get install sd-agent-spark

Install the spark plugin on RHEL/CentOS:

sudo yum install sd-agent-spark

Read more about agent plugins.

Configuring the agent to monitor spark

The spark check can be used on your Mesos master, YARN ResourceManager, or your Spark master.

1. Configure the check in /etc/sd-agent/conf.d/spark.yaml

init_config:

instances:
  - spark_url: http://localhost:8088 # Spark master web UI
#   spark_url: http://:5050 # Mesos master web UI
#   spark_url: http://:8088 # YARN ResourceManager address

    spark_cluster_mode: spark_standalone_mode # default is spark_yarn_mode
#   spark_cluster_mode: spark_mesos_mode
#   spark_cluster_mode: spark_yarn_mode

    cluster_name:  # required; adds a tag 'cluster_name:' to all metrics

#   spark_pre_20_mode: true   # if you use Standalone Spark < v2.0
#   spark_proxy_enabled: true # if you have enabled the spark UI proxy

Ensure that you set your spark_url and spark_cluster_mode to match your deployment.

2. Restart the agent

sudo /etc/init.d/sd-agent restart

or

sudo systemctl restart sd-agent

Verifying the configuration
Execute info to verify the configuration with the following:

sudo /etc/init.d/sd-agent info 

or

/usr/share/python/sd-agent/agent.py info

If the agent has been configured correctly you'll see an output such as:

spark
-----
  - instance #0 [OK]
  - Collected * metrics

You can also view the metrics returned with the following command:

sudo -u sd-agent /usr/share/python/sd-agent/agent.py check spark

Configuring graphs

Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the spark metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.

Screen_Shot_2018-01-18_at_11.46.12.png

Monitored metrics

Metric Values
spark.driver.active_tasks

Number of active tasks in the driver
task / second
Type: float
spark.driver.completed_tasks

Number of completed tasks in the driver
task / second
Type: float
spark.driver.disk_used

Amount of disk used in the driver
byte / second
Type: float
spark.driver.failed_tasks

Number of failed tasks in the driver
task / second
Type: float
spark.driver.max_memory

Maximum memory used in the driver
byte / second
Type: float
spark.driver.memory_used

Amount of memory used in the driver
byte / second
Type: float
spark.driver.rdd_blocks

Number of RDD blocks in the driver
block / second
Type: float
spark.driver.total_duration

Fraction of time (ms/s) spent by the driver
fraction / None
Type: float
spark.driver.total_input_bytes

Number of input bytes in the driver
byte / second
Type: float
spark.driver.total_shuffle_read

Number of bytes read during a shuffle in the driver
byte / second
Type: float
spark.driver.total_shuffle_write

Number of shuffled bytes in the driver
byte / second
Type: float
spark.driver.total_tasks

Number of total tasks in the driver
task / second
Type: float
spark.executor.active_tasks

Number of active tasks in the application's executors
task / second
Type: float
spark.executor.completed_tasks

Number of completed tasks in the application's executors
task / second
Type: float
spark.executor.disk_used

Amount of disk space used by persisted RDDs in the application's executors
byte / second
Type: float
spark.executor.failed_tasks

Number of failed tasks in the application's executors
task / second
Type: float
spark.executor.memory_used

Amount of memory used for cached RDDs in the application's executors
byte / second
Type: float
spark.executor.rdd_blocks

Number of persisted RDD blocks in the application's executors
block / second
Type: float
spark.executor.total_duration

Fraction of time (ms/s) spent by the application's executors executing tasks
fraction / None
Type: float
spark.executor.total_input_bytes

Total number of input bytes in the application's executors
byte / second
Type: float
spark.executor.total_shuffle_read

Total number of bytes read during a shuffle in the application's executors
byte / second
Type: float
spark.executor.total_shuffle_write

Total number of shuffled bytes in the application's executors
byte / second
Type: float
spark.executor.total_tasks

Total number of tasks in the application's executors
task / second
Type: float
spark.executor_memory

Maximum memory available for caching RDD blocks in the application's executors
byte / second
Type: float
spark.job.num_active_stages

Number of active stages in the application
stage / second
Type: float
spark.job.num_active_tasks

Number of active tasks in the application
task / second
Type: float
spark.job.num_completed_stages

Number of completed stages in the application
stage / second
Type: float
spark.job.num_failed_stages

Number of failed stages in the application
stage / second
Type: float
spark.job.num_failed_tasks

Number of failed tasks in the application
task / second
Type: float
spark.job.num_skipped_stages

Number of skipped stages in the application
stage / second
Type: float
spark.job.num_skipped_tasks

Number of skipped tasks in the application
task / second
Type: float
spark.job.num_tasks

Number of tasks in the application
task / second
Type: float
spark.rdd.disk_used

Amount of disk space used by persisted RDDs in the application
byte / second
Type: float
spark.rdd.memory_used

Amount of memory used in the application's persisted RDDs
byte / second
Type: float
spark.rdd.num_cached_partitions

Number of in-memory cached RDD partitions in the application
None / second
Type: float
spark.rdd.num_partitions

Number of persisted RDD partitions in the application
None / second
Type: float
spark.stage.disk_bytes_spilled

Max size on disk of the spilled bytes in the application's stages
byte / second
Type: float
spark.stage.executor_run_time

Fraction of time (ms/s) spent by the executor in the application's stages
fraction / None
Type: float
spark.stage.input_bytes

Input bytes in the application's stages
byte / second
Type: float
spark.stage.input_records

Input records in the application's stages
record / second
Type: float
spark.stage.memory_bytes_spilled

Number of bytes spilled to disk in the application's stages
byte / second
Type: float
spark.stage.num_active_tasks

Number of active tasks in the application's stages
task / second
Type: float
spark.stage.num_complete_tasks

Number of complete tasks in the application's stages
task / second
Type: float
spark.stage.num_failed_tasks

Number of failed tasks in the application's stages
task / second
Type: float
spark.stage.output_bytes

Output bytes in the application's stages
byte / second
Type: float
spark.stage.output_records

Output records in the application's stages
record / second
Type: float
spark.stage.shuffle_read_bytes

Number of bytes read during a shuffle in the application's stages
byte / second
Type: float
spark.stage.shuffle_read_records

Number of records read during a shuffle in the application's stages
record / second
Type: float
spark.stage.shuffle_write_bytes

Number of shuffled bytes in the application's stages
byte / second
Type: float
spark.stage.shuffle_write_records

Number of shuffled records in the application's stages
record / second
Type: float
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.