If we don't have a native plugin available for what you need to monitor you can create a custom plugin to gather the metrics and post them back into your Server Density account allowing you to graph and alert on your own custom metrics.
v2 custom plugin metrics are not viewable in the Custom Plugins tab. They will appear in your account like official plugin metrics with the name you provide.
Using Windows or agent v1? Read the docs.
Python 2.7+ required. Python 3 not supported.
Interface
All custom plugins must inherit the AgentCheck
class found in checks/__init__.py
and each plugin requires a check()
method that takes a single argument of instance
. instance
is a dict which holds the configuration of a particular instance. The check
method will be run once per each instance defined in the check config.
Metrics
Collecting metrics in your plugin is easy. Within the AgentCheck
class you have the following methods available to you, representing each metric type:
self.gauge(metric, value, tags) # Collect a gauge metric self.histogram(metric, value, tags) # Collect a histogram metric self.rate(metric, value, tags) # Collect a point, with the rate calculated at the end of the check self.count(metric, value, tags) # Collect a raw count metric self.monotonic_count(metric, value, tags) # Collect an increasing counter metric
Each of these methods can take the following arguments:
- metric - The name of the metric
- value - The value of the metric. This must be an integer or a float value for Server Density to store the value, else it can only be used for alerting. We will attempt to convert strings to integer and float values, however if this fails the metric will be silently discarded.
- tags - A list of tags to be associated with the metric (optional). You can find out more about tags in the Metric Tags document.
You can call these methods from anywhere in your plugin logic and once the check is completed any metrics that were collected will be sent to Server Density in the next payload.
Metric names must consist of one top level name (often the plugin name), and at least one metric name separated by a period. For example, plugin.metric
or plugin.metric_category.metric
Metrics without at least one sub metric will be rejected and not stored on processing.
Exceptions
Meaningful exceptions should be raised if the check is unable to complete due to any reason, such as incorrect configuration, a programming error, or an inability to collect metrics. The exceptions are logged and will be shown in the output of sd-agent info
command to allow for easy debugging.
Logging
The AgentCheck class gives you access to a logger at self.log
allowing you to output to the sd-agent collector log. You can use this to output helpful debug and info messages from your check to allow for easy debugging. The log handler will inherit the name of your plugin in the form of checks.{plugin_name}
, where {plugin_name}
is the name of your plugin. To output a debug message you can do something similar to
self.log.debug('Helpful debug message')
or to output an info message you can do something like
self.log.info('Check completed successfully!')
Plugin Configuration
Each plugin needs a configuration file that should be placed in the sd-agent conf.d directory. Configuration files should be formatted in YAML and the configuration file name should match the plugin name (IE, customplugin.py and customplugin.yaml). Configuration files have the following structure:
init_config: min_collection_interval: 120 key_1: val_1 key_2: val_2 instances: - username: jon_doe password: abcd - username: jane_doe password: wxyz
min_collection_interval
can be defined in the init_config
section to specify how often the check should run. If this is not specified it will default to 0
which will run the check on every collector run (every 60s). If the value is less than 60 this will cause the check to be run on every collector run. If the value is greater than 60 the collector will check to see if the specified min_collection_interval
has elapsed and if it has then the check will run, else the collector will output a message to logs to state that the check has been skipped.
init_config
The init_config section allows you to set global configuration options for the check. These global configuration options will be available to the check on every run.
instances
The instance section is a list of instances that the check will be run against. Your check() method will run once per instance meaning that your custom plugin can support multiple instances simply by adding extra configuration.
sd-agent Plugin Directories
checks.d
Your custom plugin code (myplugin.py) should be placed in your additional checks.d folder. You can configure the additional checks.d directory in your agent config.cfg
by adding the following:
additional_checksd: /path/to/checks.d/
Alternatively you can put your plugin in /etc/sd-agent/checks.d without any configuration, though you will need to create the directory.
conf.d
Your custom plugin configuration file (myplugin.yaml) should be placed in your conf.d folder. On linux installs this is at /etc/sd-agent/conf.d/
Virtual Environment
The sd-agent makes use of a virtual environment for it's python dependencies. If you need to install extra dependencies in the virtual environment you can use pip which is available at /usr/share/python/sd-agent/bin/pip
.
A Simple Example
For a simple example we will define a plugin that simply sends a static value for a metric back to Server Density on each plugin execution. Remember that the plugin file name and configuration file name needs to match for the agent to execute your plugin. Lets start with a simple configuration that doesn't include any configuration information:
conf.d/example.yaml
init_config:
instances:
[{}]
For the actual check we need to be sure to inherit from AgentCheck
in which we will define my.metric
and set its value to 1
. As we are calling this plugin example
we will prepend example
to the metric name so that we know this is a metric from the example plugin. This gives a full metric name of example.my.metric
checks.d/example.py
from checks import AgentCheck
class ExampleCheck(AgentCheck):
def check(self, instance):
self.gauge('example.my.metric', 1)
A Complex Example
For a more complex example we will define a plugin that attempts to open a socket to a server and port. In the configuration we'll set a global timeout value that will apply to all instances using init_config. We'll also add the ability to define optional configurations with default fallbacks if nothing is defined on a per instance basis and a mandatory configuration value.
In this example the only required configuration value is server. So the minimum viable configuration for two instances would be:
conf.d/portmon.yaml
init_config:
instances:
- server: example.com
- server: test.com
This will cause the timeout to be defined as 5s
, the port as 80
and the tags set to server: example.com:80
and server: test.com:80
, respectively for each instance.
The plugin will return two metrics for each instance; the response time (as portmon.response.time
), and a status integer (as portmon.response.status
). This will allow us to graph and alert on these metrics in our Server Density account with ease.
However, we can also define more complex configurations. For example the configuration below will set the global timeout to 10. However the 8.8.8.8 instance timeout will be overridden by the timeout configuration in the instance. The 8.8.8.8 instance of the check will also connect on port 53 and append the tags 'dns' and 'google' to the metrics, along with the 'server: 8.8.8.8:53' tag. The example.com instance will keep the global timeout configuration, default to port 80 when connecting and only append the 'server: example.com' tag
conf.d/portmon.yaml
init_config:
timeout: 10
instances:
- server: example.com
- server: 8.8.8.8
port: 53
timeout: 100
tags:
- dns
- google
checks.d/portmon.py
import time
import socket
from checks import AgentCheck
class PortMon(AgentCheck):
def check(self, instance):
# Load default_timeout value from the init_config, if not present default to 5
default_timeout = self.init_config.get('default_timeout', 5)
# Load port value from the instance config
port = instance.get('port', 80)
# Attempt to load the timeout from the instance config. If not present fallback to default_timeout
timeout = float(instance.get('timeout', default_timeout))
# If we don't find a server for this instance stop the check now
if 'server' not in instance:
# Output to the info log that we're skipping this instance due to no server being configured
self.log.info("Skipping instance, no server found.")
return
server = instance['server']
# Attempt to load the tags from the instance config. If not present fallback to an empty list
tags = instance.get('tags', [])
# Append the tag 'server: server:port' to the tags list, based on the values loaded from the instance config.
tags.append("server: {}:{}".format(server,port))
# A handy debug line in case we need to output information for troubleshooting
self.log.debug("Timeout set to {} for {}:{} with tags: {}".format(timeout, server, port, tags))
# Begin the check by creating a socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Set the timeout on the socket to the configured timeout
s.settimeout(timeout)
# Get the current time so we can calculate the response time
t_init = time.time()
# Attempt the following unless an error is seen
try:
# Set status to 1, so we can report back a simple status metric
status = 1
# Attempt to connect to a remote socket at server, port
s.connect((server, port))
# Measure the response time from the timestamp we took earlier in the check
response_time = time.time() - t_init
# Close the socket
s.close()
# If we see a socket error or a socket timeout
except (socket.error, socket.timeout):
# As this is an error condition we'll set the response time to '-1'
# so that it's obvious the connection failed when viewing graphs
response_time = -1
# We'll also set the status to 0 as this is an error
status = 0
# Set the portmon.response.time metric, along with the tags we set earlier
self.gauge('portmon.response.time', response_time, tags=tags)
# Set the portmon.response.status metric, along with the tags we set earlier
self.gauge('portmon.response.status', status, tags=tags)
# The check is complete.
# Once all instances have completed checks the results will be sent to Server Density!
if __name__ == '__main__':
# Load the check and instance configurations
check, instances = PortMon.from_yaml('/etc/sd-agent/conf.d/portmon.yaml')
for instance in instances:
print "\nRunning the check against host: {}:{}".format(instance['server'],instance.get('port', 80))
check.check(instance)
print 'Metrics: {}'.format(check.get_metrics())
Troubleshooting
You can execute your check within the context of the agent by executing the check subcommand:
/usr/share/python/sd-agent/agent.py check {checkname}
where {checkname} is the name of the plugin you have created, assuming the plugin is located in the checks.d folder as the agent expects. With the above example, with a plugin named portmon, you would need to execute the following:
/usr/share/python/sd-agent/agent.py check portmon
If you continue to have issues please send an email to hello@serverdensity.com with a copy of the check code, example configuration and any relevant logs.
Comments