Monitoring Cassandra

This article will help you get the Cassandra plugin for sd-agent configured and returning metrics

Installing the cassandra plugin package

Install the cassandra plugin on Debian/Ubuntu:

sudo apt-get install sd-agent-cassandra

Install the cassandra plugin on RHEL/CentOS:

sudo yum install sd-agent-cassandra

Read more about agent plugins.

Configuring the agent to monitor Cassandra

1. Configure the instances in /etc/sd-agent/conf.d/cassandra.yaml:

instances:
  - host: localhost
    port: 7199
    cassandra_aliasing: true
  #   user: username
  #   password: password
  #   process_name_regex: .*process_name.* # Instead of specifying a host, and port. The agent can connect using the attach api.
  #                                        # This requires the JDK to be installed and the path to tools.jar to be set below.
  #   tools_jar_path: /usr/lib/jvm/java-7-openjdk-amd64/lib/tools.jar # To be set when process_name_regex is set
  #   name: cassandra_instance
  #   # java_bin_path: /path/to/java # Optional, should be set if the agent cannot find your java executable
  #   # java_options: "-Xmx200m -Xms50m" # Optional, Java JVM options
  #   # trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled
  #   # trust_store_password: password
  #   tags:
  #     env: stage
  #     newTag: test 

If a username and password is required then ensure to uncomment those options and set them as desired, ensuring the yaml stays valid.

It's also possible to use the attach api instead, by specifying the process name regex and the tool.jar path. Note that if SSL is enabled you'll also need to set the trust store path.

2. Restart the agent

sudo /etc/init.d/sd-agent restart

or

sudo systemctl restart sd-agent

Verifying the configuration
Execute info to verify the configuration with the following:

sudo /etc/init.d/sd-agent info 

or

/usr/share/python/sd-agent/agent.py info

If the agent has been configured correctly you'll see an output such as:

cassandra
-----
  - instance #0 [OK]
  - Collected * metrics

You can also view the metrics returned with the following command:

service sd-agent jmx collect

Configuring graphs

Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the cassandra metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.

Screen_Shot_2018-01-18_at_11.46.12.png

Monitored metrics

MetricValues
cassandra.active_tasks

The number of tasks that the thread pool is actively executing.
task / None
Type: float
cassandra.bloom_filter_disk_space_used

Disk space used by the Bloom filters.
byte / None
Type: float
cassandra.bloom_filter_false_positives

The number of Bloom filter false positives.
event / None
Type: float
cassandra.bloom_filter_false_ratio

The ratio of Bloom filter false positives to total checks.
fraction / None
Type: float
cassandra.capacity

The capacity of the caches, such as the key cache and row cache.
byte / None
Type: float
cassandra.completed_tasks

The number of tasks that the thread pool has completed.
task / None
Type: float
cassandra.compression_ratio

The compression ratio for all SSTables in a column family.
fraction / None
Type: float
cassandra.currently_blocked_tasks.count

The number of currently blocked tasks for the thread pool.
task / None
Type: float
cassandra.db.bloom_filter_disk_space_used

Disk space used by the Bloom filters. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.bloom_filter_disk_space_used instead)
byte / None
Type: float
cassandra.db.bloom_filter_false_positives

The number of Bloom filter false positives. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.bloom_filter_false_positives instead)
event / None
Type: float
cassandra.db.bloom_filter_false_ratio

The ratio of Bloom filter false positives to total checks. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.bloom_filter_false_ratio instead)
fraction / None
Type: float
cassandra.db.completed_tasks

Completed compaction or commitlog tasks. (Metric may not be available for Cassandra versions > 2.2.)
task / None
Type: float
cassandra.db.compression_ratio

The compression ratio for all SSTables in a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.compression_ratio instead)
fraction / None
Type: float
cassandra.db.exception_count

The number of exceptions thrown. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.exceptions.count instead)
error / None
Type: float
cassandra.db.key_cache_recent_hit_rate

Ratio of key cache hits to key cache requests since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
fraction / None
Type: float
cassandra.db.live_disk_space_used

Disk space used by "live" SSTables (only counts non-obsolete files). (Metric may not be available for Cassandra versions > 2.2. Use cassandra.live_disk_space_used.count instead)
byte / None
Type: float
cassandra.db.live_ss_table_count

Number of "live" (non-obsolete) SSTables. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.live_ss_table_count instead)
file / None
Type: float
cassandra.db.load

Disk space used on a node. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.load.count instead)
byte / None
Type: float
cassandra.db.max_row_size

Size of the largest compacted row. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.max_row_size instead)
byte / None
Type: float
cassandra.db.mean_row_size

Average size of compacted rows. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.mean_row_size instead)
byte / None
Type: float
cassandra.db.memtable_columns_count

Number of columns in memtable. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.memtable_columns_count instead)
column / None
Type: float
cassandra.db.memtable_data_size

Size of data stored in memtable. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.memtable_live_data_size instead)
byte / None
Type: float
cassandra.db.memtable_switch_count

Number of times a full memtable has been switched out for an empty one due to flushing. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.memtable_switch_count.count instead)
event / None
Type: float
cassandra.db.min_row_size

Size of the smallest compacted row. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.min_row_size instead)
byte / None
Type: float
cassandra.db.pending_tasks

Pending compaction, commitlog, or column family tasks. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.pending_tasks instead)
task / None
Type: float
cassandra.db.range_operations

Count of range scan operations. (Metric may not be available for Cassandra versions > 2.2.)
operation / None
Type: float
cassandra.db.read_count

The number of local read requests for a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.latency.count instead)
read / None
Type: float
cassandra.db.read_operations

Count of read operations. (Metric may not be available for Cassandra versions > 2.2.)
operation / None
Type: float
cassandra.db.recent_range_latency_micros

The latency of range scans since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
microsecond / None
Type: float
cassandra.db.recent_read_latency_micros

The latency of reads since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
microsecond / None
Type: float
cassandra.db.recent_write_latency_micros

The latency of writes since the last time this attribute was read. (Metric may not be available for Cassandra versions > 2.2.)
microsecond / None
Type: float
cassandra.db.total_disk_space_used

Disk space used by a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.total_disk_space_used.count instead)
byte / None
Type: float
cassandra.db.total_range_latency_micros

Total latency for all range scans. (Metric may not be available for Cassandra versions > 2.2.)
microsecond / None
Type: float
cassandra.db.total_read_latency_micros

Total latency for all read requests. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.total_latency.count instead)
microsecond / None
Type: float
cassandra.db.total_write_latency_micros

Total latency for all write requests. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.total_latency.count instead)
microsecond / None
Type: float
cassandra.db.update_interval

The configurable update interval for the dynamic snitch, which monitors read latency to route requests away from slow nodes.
millisecond / None
Type: float
cassandra.db.write_count

The number of local write requests for a column family. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.latency.count instead)
write / None
Type: float
cassandra.db.write_operations

Count of write operations. (Metric may not be available for Cassandra versions > 2.2.)
operation / None
Type: float
cassandra.exceptions.count

The number of exceptions thrown.
error / None
Type: float
cassandra.hits.count

The number of hits to a cache.
hit / None
Type: float
cassandra.internal.active_count

The number of tasks that the thread pool is actively executing. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.active_tasks instead)
task / None
Type: float
cassandra.internal.completed_tasks

The number of tasks that the thread pool has completed. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.completed_tasks instead)
task / None
Type: float
cassandra.internal.currently_blocked_tasks

The number of currently blocked tasks for the thread pool. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.currently_blocked_tasks.count instead)
task / None
Type: float
cassandra.internal.total_blocked_tasks

The cumulative total of currently blocked tasks for the thread pool. (Metric may not be available for Cassandra versions > 2.2.)
task / None
Type: float
cassandra.latency.count

The number of client requests.
request / None
Type: float
cassandra.latency.one_minute_rate

Recent rate of client requests, as an exponentially weighted moving average over a one-minute interval.
request / second
Type: float
cassandra.live_disk_space_used.count

Disk space used by "live" SSTables (only counts non-obsolete files).
byte / None
Type: float
cassandra.live_ss_table_count

Number of "live" (non-obsolete) SSTables.
file / None
Type: float
cassandra.load.count

Disk space used on a node.
byte / None
Type: float
cassandra.max_row_size

Size of the largest compacted row.
byte / None
Type: float
cassandra.mean_row_size

Average size of compacted rows.
byte / None
Type: float
cassandra.memtable_columns_count

Number of columns in memtable.
column / None
Type: float
cassandra.memtable_live_data_size

Size of data stored in memtable.
byte / None
Type: float
cassandra.memtable_switch_count.count

Number of times a full memtable has been switched out for an empty one due to flushing.
event / None
Type: float
cassandra.min_row_size

Size of the smallest compacted row.
byte / None
Type: float
cassandra.net.total_timeouts

Count of requests not acknowledged within configurable timeout window. (Metric may not be available for Cassandra versions > 2.2. Use cassandra.timeouts.count instead)
timeout / None
Type: float
cassandra.pending_tasks

The number of pending tasks for the thread pool.
task / None
Type: float
cassandra.requests.count

The number of requests to a cache.
request / None
Type: float
cassandra.size

Size of cache.
byte / None
Type: float
cassandra.timeouts.count

Count of requests not acknowledged within configurable timeout window.
timeout / None
Type: float
cassandra.timeouts.one_minute_rate

Recent timeout rate, as an exponentially weighted moving average over a one-minute interval.
timeout / second
Type: float
cassandra.total_disk_space_used.count

Disk space used by a column family.
byte / None
Type: float
cassandra.total_latency.count

Total latency for all client requests.
microsecond / None
Type: float
cassandra.unavailables.count

Count of requests for which the required number of nodes was unavailable.
error / None
Type: float
cassandra.unavailables.one_minute_rate

Recent rate of unavailable exceptions, as an exponentially weighted moving average over a one-minute interval.
error / second
Type: float
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.