This article will help you get the Tokumx plugin for sd-agent configured and returning metrics
Installing the tokumx plugin package
Install the tokumx plugin on Debian/Ubuntu:
sudo apt-get install sd-agent-tokumx
Install the tokumx plugin on RHEL/CentOS:
sudo yum install sd-agent-tokumx
Read more about agent plugins.
Configuring the agent to monitor TokuMX
1. Connect to the mongo shell and create a read only monitoring user. Make sure to authenticate before doing so:
mongo
use admin
db.addUser("serverdensity", "supersecurepassword", true)
2. Configure /etc/sd-agent/conf.d/tokumx.yaml
init_config: instances: # Specify the MongoDB URI, with database to use for reporting (defaults to "admin") # E.g. mongodb://localhost:27017/my-db - server: mongodb://localhost:27017 username: serverdensity password: supersecretpassword
- If you run the database server on a non-standard port, you need to declare the path to the sock or want to execute check, amend the rest of the config file as necessary.
- NOTE: It is possible to use a connection string in the server option, such as
mongodb://serverdensity:supersecurepassword@localhost:27016/my-db
. However this is NOT recommended as the server string is automatically used as a tag and this will expose your user and password in the Server Density UI.
3. Restart the agent
sudo /etc/init.d/sd-agent restart
or
sudo systemctl restart sd-agent
Connecting over SSL
We can connect to your TokuMX instances using SSL but it may require some changes to the agent if you are using certificate files.
- If you are not using certificate files
You can just specify ?ssl=true in the connection string e.g. in step 2 of the installation instructions, then specify mongodb://hostname:port/?ssl=true as part of the config.
- If you are using certificate files
If you are using cert files then you'll need to tell the agent where they are. Set these 4 lines in your {agentdir}/conf.d/tokumx.yaml file:
ssl: True # Optional (default to False) ssl_keyfile: /path/to/key.file ssl_certfile: /path/to/cert.file
Verifying the configuration
Execute info to verify the configuration with the following:
sudo /etc/init.d/sd-agent info
or
/usr/share/python/sd-agent/agent.py info
If the agent has been configured correctly you'll see an output such as:
tokumx ----- - instance #0 [OK] - Collected * metrics
You can also view the metrics returned with the following command:
sudo -u sd-agent /usr/share/python/sd-agent/agent.py check tokumx
Configuring graphs
Click the name of your server from the Devices list in your Server Density account then go to the Metrics tab. Click the + Graph button on the right then choose the tokumx metrics to display the graphs. The metrics will also be available to select when building dashboard graphs.
Monitored metrics
Metric | Values |
---|---|
tokumx.asserts.msgps The number of message assertions raised per second. |
assertion / second Type: float |
tokumx.asserts.regularps The number of regular assertions raised per second. |
assertion / second Type: float |
tokumx.asserts.rolloversps The number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions. |
assertion / second Type: float |
tokumx.asserts.userps The number of user assertions raised per second. |
assertion / second Type: float |
tokumx.asserts.warningps The number of warnings raised per second. |
assertion / second Type: float |
tokumx.connections.available The number of unused available incoming connections the database can provide. |
connection / None Type: float |
tokumx.connections.current The number of connections to the database server from clients. |
connection / None Type: float |
tokumx.cursors.timedOut The total number of cursors that have timed out since the server process started. |
cursor / None Type: float |
tokumx.cursors.totalOpen The number of cursors that tokumx is maintaining for clients. |
cursor / None Type: float |
tokumx.ft.alerts.checkpointFailures The number of checkpoints that have failed for any reason. |
event / None Type: float |
tokumx.ft.alerts.locktreeRequestsPending The number of requests for Document-level Locks in the locktree that are waiting for other requests to release their locks. |
request / None Type: float |
tokumx.ft.alerts.longWaitEvents.cachePressure.countps Rate at which a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed. |
event / second Type: float |
tokumx.ft.alerts.longWaitEvents.cachePressure.timeps Fraction of time (microseconds/second) that a thread had to wait more than 1 second for evictions to create space in the cachetable for it to page in data it needed. |
fraction / None Type: float |
tokumx.ft.alerts.longWaitEvents.checkpointBegin.countps Rate at which the begin checkpoint phase of checkpoint has run (these should be fairly quick). |
event / second Type: float |
tokumx.ft.alerts.longWaitEvents.checkpointBegin.timeps Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads. |
fraction / None Type: float |
tokumx.ft.alerts.longWaitEvents.fsync.countps Rate at which fsync operations took more than 1 second. |
event / second Type: float |
tokumx.ft.alerts.longWaitEvents.fsync.timeps Fraction of time (microseconds/second) spent performing fsync operations that took longer than 1 second. |
fraction / None Type: float |
tokumx.ft.alerts.longWaitEvents.locktreeWait.countps Rate at which a thread had to wait more than 1 second to acquire a document-level lock in the locktree. |
event / second Type: float |
tokumx.ft.alerts.longWaitEvents.locktreeWait.timeps Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock in the locktree. |
fraction / None Type: float |
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.countps Rate at which a thread had to wait more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation. |
event / second Type: float |
tokumx.ft.alerts.longWaitEvents.locktreeWaitEscalation.timeps Fraction of time (microseconds/second) spent by threads waiting more than 1 second to acquire a document-level lock because the locktree was at the memory limit and needed to run escalation. |
fraction / None Type: float |
tokumx.ft.alerts.longWaitEvents.logBufferWaitps Rate at which a writing client had to wait more than 100ms for access to the log buffer. |
event / second Type: float |
tokumx.ft.cachetable.evictions.full.leaf.clean.bytesps Rate of full evictions of leaf nodes. |
byte / second Type: float |
tokumx.ft.cachetable.evictions.full.leaf.clean.countps Rate of full evictions of leaf nodes. |
event / second Type: float |
tokumx.ft.cachetable.evictions.full.leaf.dirty.bytesps Rate of full evictions of leaf nodes that need to be written back to disk. |
byte / second Type: float |
tokumx.ft.cachetable.evictions.full.leaf.dirty.countps Rate of full evictions of leaf nodes that need to be written back to disk. |
event / second Type: float |
tokumx.ft.cachetable.evictions.full.leaf.dirty.timeps Fraction of time (microseconds/second) spent performing full evictions leaf nodes, including the time spent serializing, compressing, and writing those nodes to disk. |
fraction / None Type: float |
tokumx.ft.cachetable.evictions.full.nonleaf.clean.bytesps Rate of full evictions of nonleaf nodes. |
byte / second Type: float |
tokumx.ft.cachetable.evictions.full.nonleaf.clean.countps Rate of full evictions of nonleaf nodes. |
event / second Type: float |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.bytesps Rate of full evictions of nonleaf nodes that need to be written back to disk. |
byte / second Type: float |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.countps Rate of full evictions of nonleaf nodes that need to be written back to disk. |
event / second Type: float |
tokumx.ft.cachetable.evictions.full.nonleaf.dirty.timeps Fraction of time (microseconds/second) spent performing full evictions nonleaf nodes, including the time spent serializing, compressing, and writing those nodes to disk. |
fraction / None Type: float |
tokumx.ft.cachetable.evictions.partial.leaf.clean.bytesps Rate of partial evictions of leaf nodes. |
byte / second Type: float |
tokumx.ft.cachetable.evictions.partial.leaf.clean.countps Rate of partial evictions of leaf nodes. |
event / second Type: float |
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.bytesps Rate of partial evictions of nonleaf nodes. |
byte / second Type: float |
tokumx.ft.cachetable.evictions.partial.nonleaf.clean.countps Rate of partial evictions of nonleaf nodes. |
event / second Type: float |
tokumx.ft.cachetable.miss.countps Rate of internal cache misses. This metric is similar to MongoDBâs btree misses and page faults. |
miss / second Type: float |
tokumx.ft.cachetable.miss.full.countps Rate of full internal cache misses. |
miss / second Type: float |
tokumx.ft.cachetable.miss.full.timeps Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a full cache miss. |
fraction / None Type: float |
tokumx.ft.cachetable.miss.partial.countps Rate of partial internal cache misses. |
miss / second Type: float |
tokumx.ft.cachetable.miss.partial.timeps Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for a partial cache miss. |
fraction / None Type: float |
tokumx.ft.cachetable.miss.timeps Fraction of time (microseconds/second) the database has had to wait for a disk read to complete for cache misses. |
fraction / None Type: float |
tokumx.ft.cachetable.size.current Total amount of uncompressed data currently in the database's internal cache. |
byte / None Type: float |
tokumx.ft.cachetable.size.limit Total amount of uncompressed data that will fit in TokuMXâs internal cache. |
byte / None Type: float |
tokumx.ft.cachetable.size.writing Total size of nodes that are currently queued up to be written to disk for eviction. |
byte / None Type: float |
tokumx.ft.checkpoint.begin.timeps Fraction of time (microseconds/second) that a begin checkpoint phase has spent blocking other threads. |
fraction / None Type: float |
tokumx.ft.checkpoint.countps Rate at which checkpoints are completed. |
event / second Type: float |
tokumx.ft.checkpoint.lastComplete.time The time spent, in seconds, by the most recently completed checkpoint. |
second / None Type: float |
tokumx.ft.checkpoint.timeps Fraction of time (seconds/second) spent doing checkpoints. |
fraction / None Type: float |
tokumx.ft.checkpoint.write.leaf.bytes.compressedps The rate at which leaf nodes are written to disk during checkpoints, after compression. |
byte / second Type: float |
tokumx.ft.checkpoint.write.leaf.bytes.uncompressedps The rate at which leaf nodes are written to disk during checkpoints, before compression. |
byte / second Type: float |
tokumx.ft.checkpoint.write.leaf.countps The rate at which leaf nodes are written to disk during checkpoints. |
write / second Type: float |
tokumx.ft.checkpoint.write.leaf.timeps The fraction of time spent writing leaf nodes to disk during checkpoints. |
fraction / None Type: float |
tokumx.ft.checkpoint.write.nonleaf.bytes.compressedps The rate at which nonleaf nodes are written to disk during checkpoints, after compression. |
byte / second Type: float |
tokumx.ft.checkpoint.write.nonleaf.bytes.uncompressedps The rate at which nonleaf nodes are written to disk during checkpoints, before compression. |
byte / second Type: float |
tokumx.ft.checkpoint.write.nonleaf.countps The rate at which nonleaf nodes are written to disk during checkpoints. |
write / second Type: float |
tokumx.ft.checkpoint.write.nonleaf.timeps The fraction of time spent writing nonleaf nodes to disk during checkpoints. |
fraction / None Type: float |
tokumx.ft.compressionRatio.leaf The size ratio of leaf nodes before and after compression. |
fraction / None Type: float |
tokumx.ft.compressionRatio.nonleaf The size ratio of nonleaf nodes before and after compression. |
fraction / None Type: float |
tokumx.ft.compressionRatio.overall The size ratio of nodes before and after compression. |
fraction / None Type: float |
tokumx.ft.fsync.countps The rate at which the database flushed the operating systemâs file buffers to disk. |
operation / second Type: float |
tokumx.ft.fsync.timeps The fraction of time (microseconds/second) used to fsync to disk. |
fraction / None Type: float |
tokumx.ft.locktree.size.current Total memory the locktree is currently using. |
byte / None Type: float |
tokumx.ft.locktree.size.limit Maximum number of bytes that the locktree is allowed to use. |
byte / None Type: float |
tokumx.ft.log.bytesps The rate at which the logger writes to disk. |
byte / second Type: float |
tokumx.ft.log.countps The rate of of individual log writes. |
write / second Type: float |
tokumx.ft.log.timeps The fraction of time spent performing log writes. |
fraction / None Type: float |
tokumx.ft.serializeTime.leaf.compressps Fraction of time spent compressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
fraction / None Type: float |
tokumx.ft.serializeTime.leaf.decompressps Fraction of time spent decompressing leaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
fraction / None Type: float |
tokumx.ft.serializeTime.leaf.deserializeps Fraction of time spent deserializing leaf nodes and their partitions after reading them off disk. |
fraction / None Type: float |
tokumx.ft.serializeTime.leaf.serializeps Fraction of time spent serializing leaf nodes and their partitions after reading them off disk. |
fraction / None Type: float |
tokumx.ft.serializeTime.nonleaf.compressps Fraction of time spent compressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
fraction / None Type: float |
tokumx.ft.serializeTime.nonleaf.decompressps Fraction of time spent decompressing nonleaf nodes before writing them to disk (for checkpoint or when evicted while dirty). |
fraction / None Type: float |
tokumx.ft.serializeTime.nonleaf.deserializeps Fraction of time spent deserializing nonleaf nodes and their partitions after reading them off disk. |
fraction / None Type: float |
tokumx.ft.serializeTime.nonleaf.serializeps Fraction of time spent serializing nonleaf nodes and their partitions after reading them off disk. |
fraction / None Type: float |
tokumx.mem.resident The amount of memory currently used by the database process. |
mebibyte / None Type: float |
tokumx.mem.virtual The amount of virtual memory used by the database process. |
mebibyte / None Type: float |
tokumx.metrics.document.deletedps The number of documents deleted per second. |
document / second Type: float |
tokumx.metrics.document.insertedps The number of documents inserted per second. |
document / second Type: float |
tokumx.metrics.document.returnedps The number of documents returned by queries per second. |
document / second Type: float |
tokumx.metrics.document.updatedps The number of documents updated per second. |
document / second Type: float |
tokumx.metrics.getLastError.wtime.numps The number of getLastError operations per second with a specified write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation. |
operation / second Type: float |
tokumx.metrics.getLastError.wtime.totalMillisps The number of times per second that write concern operations have timed out as a result of the wtimeout threshold to getLastError. |
event / second Type: float |
tokumx.metrics.getLastError.wtimeoutsps The fraction of time (ms/s) spent performing getLastError operations with write concern (i.e. w) that wait for one or more members of a replica set to acknowledge the write operation. |
fraction / None Type: float |
tokumx.metrics.operation.idhackps The rate of queries that contain the _id field. |
query / second Type: float |
tokumx.metrics.operation.scanAndOrderps The rate of queries that return sorted numbers that cannot perform the sort operation using an index. |
query / second Type: float |
tokumx.metrics.queryExecutor.scannedps The rate of index items scanned during queries and query-plan evaluation. |
operation / second Type: float |
tokumx.metrics.repl.apply.batches.numps The number of batches applied across all databases per second. |
operation / second Type: float |
tokumx.metrics.repl.apply.batches.totalMillisps The fraction of time (ms/s) spent applying operations from the oplog. |
fraction / None Type: float |
tokumx.metrics.repl.apply.opsps The rate of oplog operations. |
operation / second Type: float |
tokumx.metrics.repl.buffer.count The number of operations in the oplog buffer. |
operation / None Type: float |
tokumx.metrics.repl.buffer.sizeBytes The current size of the contents of the oplog buffer. |
byte / None Type: float |
tokumx.metrics.repl.network.bytesps The rate at which data is read from the replication sync source. |
byte / second Type: float |
tokumx.metrics.repl.network.getmores.numps The rate of getmore operations. |
operation / second Type: float |
tokumx.metrics.repl.network.getmores.totalMillisps The fraction of time (ms/s) spent collecting data from getmore operations. |
fraction / None Type: float |
tokumx.metrics.repl.network.opsps The rate of operations read from the replication source. |
operation / second Type: float |
tokumx.metrics.repl.network.readersCreatedps The rate at which oplog query processes are created. |
process / second Type: float |
tokumx.metrics.repl.oplog.insert.numps The rate at which operations are inserted into the oplog. |
operation / second Type: float |
tokumx.metrics.repl.oplog.insert.totalMillisps The fraction of time (ms/s) spent inserting operations into the oplog. |
fraction / None Type: float |
tokumx.metrics.repl.oplog.insertBytesps The rate (in bytes) at which data is inserted into the oplog. |
byte / second Type: float |
tokumx.metrics.ttl.deletedDocumentsps The rate at which documents are deleted from collections with a ttl index. |
document / second Type: float |
tokumx.metrics.ttl.passesps The number of times per second the background process removes documents from collections with a ttl index. |
event / second Type: float |
tokumx.opcounters.commandps The total number of commands per second issued to the database. |
command / second Type: float |
tokumx.opcounters.deleteps The number of delete operations per second. |
operation / second Type: float |
tokumx.opcounters.getmoreps The number of getmore operations per second. |
operation / second Type: float |
tokumx.opcounters.insertps The number of insert operations per second. |
operation / second Type: float |
tokumx.opcounters.queryps The total number of queries per second. |
query / second Type: float |
tokumx.opcounters.updateps The number of update operations per second. |
operation / second Type: float |
tokumx.opcountersRepl.commandps The total number of replicated commands issued to the database per second. |
command / second Type: float |
tokumx.opcountersRepl.deleteps The number of replicated delete operations per second. |
operation / second Type: float |
tokumx.opcountersRepl.getmoreps The number of replicated getmore operations per second. |
operation / second Type: float |
tokumx.opcountersRepl.insertps The number of replicated insert operations per second. |
operation / second Type: float |
tokumx.opcountersRepl.queryps The total number of replicated queries per second. |
query / second Type: float |
tokumx.opcountersRepl.updateps The number of replicated update operations per second. |
operation / second Type: float |
tokumx.stats.coll.count The number of objects or documents in this collection. |
document / None Type: float |
tokumx.stats.coll.nindexes The number of indexes on this collection. |
index / None Type: float |
tokumx.stats.coll.nindexesbeingbuilt The number of indexes currently being built. |
index / None Type: float |
tokumx.stats.coll.size The total size in memory of all records in a collection. Does not include the record header, but does include the recordâs padding. Does not include the size of any indexes associated with the collection. |
byte / None Type: float |
tokumx.stats.coll.storageSize The total amount of storage allocated to this collection for document storage. |
byte / None Type: float |
tokumx.stats.coll.totalIndexSize The total size of all indexes on this collection. |
byte / None Type: float |
tokumx.stats.coll.totalIndexStorageSize The total size on disk of all indexes on this collection (after compression). |
byte / None Type: float |
tokumx.stats.dataSize The total size of the data held in this database including the padding factor. |
byte / None Type: float |
tokumx.stats.db.avgObjSize The average size of each document. |
byte / None Type: float |
tokumx.stats.db.collections The number of collections in the database. |
None / None Type: float |
tokumx.stats.db.dataSize The total size of the data held in this database including the padding factor. |
byte / None Type: float |
tokumx.stats.db.indexSize The total size of all indexes created on this database. |
byte / None Type: float |
tokumx.stats.db.indexStorageSize The total size on disk of all indexes created on this database (after compression). |
byte / None Type: float |
tokumx.stats.db.indexes The total number of indexes across all collections in the database. |
index / None Type: float |
tokumx.stats.db.objects The number of documents in the database across all collections. |
document / None Type: float |
tokumx.stats.db.storageSize The total amount of space allocated to collections in this database for document storage. |
byte / None Type: float |
tokumx.stats.idx.avgObjSize The average size of each index entry. |
byte / None Type: float |
tokumx.stats.idx.count The number of documents in this index. |
index / None Type: float |
tokumx.stats.idx.deletes The number of delete operations performed on this index. |
operation / None Type: float |
tokumx.stats.idx.inserts The number of insert operations performed on this index. |
operation / None Type: float |
tokumx.stats.idx.nscanned The number of index entries scanned for queries using this index. |
index / None Type: float |
tokumx.stats.idx.nscannedObjects The number of collection objects examined after scanning an index entry for a query using this index. |
object / None Type: float |
tokumx.stats.idx.queries The number of query operations performed using this index. |
query / None Type: float |
tokumx.stats.idx.size The total size of this index. |
byte / None Type: float |
tokumx.stats.idx.storageSize The total size on disk of this index (after compression). |
byte / None Type: float |
tokumx.stats.indexSize The total size of all indexes created on this database. |
byte / None Type: float |
tokumx.stats.indexes The total number of indexes across all collections in the database. |
index / None Type: float |
tokumx.stats.objects The number of documents in the database across all collections. |
document / None Type: float |
tokumx.stats.storageSize The total amount of space allocated to collections in this database for document storage. |
byte / None Type: float |
tokumx.uptime The time that the tokumx process has been active. |
second / None Type: float |
Comments