Monitoring Ceph

This article will help you get the Ceph plugin for sd-agent configured and returning metrics

Installing the ceph plugin package

Install the ceph plugin on Debian/Ubuntu:

sudo apt-get install sd-agent-ceph

Install the ceph plugin on RHEL/CentOS:

sudo yum install sd-agent-ceph

Read more about agent plugins.

Configuring the agent to monitor ceph

1. Configure /etc/sd-agent/conf.d/ceph.yaml

init_config:

instances:
  - ceph_cmd: /path/to/your/ceph # default is /usr/bin/ceph
    use_sudo: true               # only if the ceph binary needs sudo on your nodes

2. sudo for sd-agent, if required

If you set the use_sudo option to yes, the add a line like the following to your sudoers file:

sd-agent ALL=(ALL) NOPASSWD:/path/to/your/ceph

3. Restart the agent

sudo /etc/init.d/sd-agent restart

or

sudo systemctl restart sd-agent

Verifying the configuration
Execute info to verify the configuration with the following:

sudo /etc/init.d/sd-agent info 

or

/usr/share/python/sd-agent/agent.py info

If the agent has been configured correctly you'll see an output such as:

ceph
-----
  - instance #0 [OK]
  - Collected * metrics

You can also view the metrics returned with the following command:

sudo -u sd-agent /usr/share/python/sd-agent/agent.py check ceph

Configuring graphs

Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the ceph metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.

Screen_Shot_2018-01-18_at_11.46.12.png

Monitored metrics

MetricValues
ceph.aggregate_pct_used

Overall capacity usage metric
percent / None
Type: float
ceph.apply_latency_ms

Time taken to flush an update to disks
millisecond / None
Type: float
ceph.commit_latency_ms

Time taken to commit an operation to the journal
millisecond / None
Type: float
ceph.num_full_osds

Number of full osds
item / None
Type: float
ceph.num_in_osds

Number of participating storage daemons
item / None
Type: float
ceph.num_mons

Number of monitor daemons
item / None
Type: float
ceph.num_near_full_osds

Number of nearly full osds
item / None
Type: float
ceph.num_objects

Object count for a given pool
item / None
Type: float
ceph.num_osds

Number of known storage daemons
item / None
Type: float
ceph.num_pgs

Number of placement groups available
item / None
Type: float
ceph.num_pools

Number of pools
item / None
Type: float
ceph.num_up_osds

Number of online storage daemons
item / None
Type: float
ceph.op_per_sec

IO operations per second for given pool
operation / second
Type: float
ceph.osd.pct_used

Percentage used of full/near full osds
percent / None
Type: float
ceph.pgstate.active_clean

Number of active+clean placement groups
item / None
Type: float
ceph.read_bytes

Per-pool read bytes
byte / None
Type: float
ceph.read_bytes_sec

Bytes/second being read
byte / None
Type: float
ceph.read_op_per_sec

Per-pool read operations/second
operation / second
Type: float
ceph.total_objects

Object count from the underlying object store
item / None
Type: float
ceph.write_bytes

Per-pool write bytes
byte / None
Type: float
ceph.write_bytes_sec

Bytes/second being written
byte / None
Type: float
ceph.write_op_per_sec

Per-pool write operations/second
operation / second
Type: float
Was this article helpful?
1 out of 1 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.