Information about Custom Plugins - v2

If we don't have a native plugin available for what you need to monitor you can create a custom plugin to gather the metrics and post them back into your Server Density account allowing you to graph and alert on your own custom metrics.

v2 custom plugin metrics are not viewable in the Custom Plugins tab. They will appear in your account like official plugin metrics with the name you provide. 

Using Windows or agent v1? Read the docs.

Python 2.7+ required. Python 3 not supported.

Interface

All custom plugins must inherit the AgentCheck class found in checks/__init__.py and each plugin requires a check() method that takes a single argument of instance. instance is a dict which holds the configuration of a particular instance. The check method will be run once per each instance defined in the check config.

Metrics

Collecting metrics in your plugin is easy. Within the AgentCheck class you have the following methods available to you, representing each metric type:

self.gauge(metric, value, tags) # Collect a gauge metric
self.histogram(metric, value, tags) # Collect a histogram metric
self.rate(metric, value, tags) # Collect a point, with the rate calculated at the end of the check
self.count(metric, value, tags) # Collect a raw count metric
self.monotonic_count(metric, value, tags) # Collect an increasing counter metric 

Each of these methods can take the following arguments: 

  • metric - The name of the metric
  • value - The value of the metric. This must be an integer or a float value for Server Density to store the value, else it can only be used for alerting. We will attempt to convert strings to integer and float values, however if this fails the metric will be silently discarded.
  • tags - A list of tags to be associated with the metric (optional). You can find out more about tags in the Metric Tags document.

You can call these methods from anywhere in your plugin logic and once the check is completed any metrics that were collected will be sent to Server Density in the next payload.

Metric names must consist of one top level name (often the plugin name), and at least one metric name separated by a period. For example, plugin.metric or plugin.metric_category.metric Metrics without at least one sub metric will be rejected and not stored on processing.

Exceptions 

Meaningful exceptions should be raised if the check is unable to complete due to any reason, such as incorrect configuration, a programming error, or an inability to collect metrics. The exceptions are logged and will be shown in the output of sd-agent info command to allow for easy debugging.

Logging

The AgentCheck class gives you access to a logger at self.log allowing you to output to the sd-agent collector log. You can use this to output helpful debug and info messages from your check to allow for easy debugging. The log handler will inherit the name of your plugin in the form of checks.{plugin_name}, where {plugin_name} is the name of your plugin. To output a debug message you can do something similar to

self.log.debug('Helpful debug message')

or to output an info message you can do something like

self.log.info('Check completed successfully!')

Plugin Configuration

Each plugin needs a configuration file that should be placed in the sd-agent conf.d directory. Configuration files should be formatted in YAML and the configuration file name should match the plugin name (IE, customplugin.py and customplugin.yaml). Configuration files have the following structure:

init_config:
    min_collection_interval: 120
    key_1: val_1
    key_2: val_2

instances:
    - username: jon_doe
      password: abcd

    - username: jane_doe
      password: wxyz 

min_collection_interval can be defined in the init_config section to specify how often the check should run. If this is not specified it will default to 0 which will run the check on every collector run (every 60s). If the value is less than 60 this will cause the check to be run on every collector run. If the value is greater than 60 the collector will check to see if the specified min_collection_interval has elapsed and if it has then the check will run, else the collector will output a message to logs to state that the check has been skipped.

init_config

The init_config section allows you to set global configuration options for the check. These global configuration options will be available to the check on every run.

instances

The instance section is a list of instances that the check will be run against. Your check() method will run once per instance meaning that your custom plugin can support multiple instances simply by adding extra configuration.

sd-agent Plugin Directories 

checks.d

Your custom plugin code (myplugin.py) should be placed in your additional checks.d folder. You can configure the additional checks.d directory in your agent config.cfg by adding the following:

additional_checksd: /path/to/checks.d/

Alternatively you can put your plugin in /etc/sd-agent/checks.d without any configuration, though you will need to create the directory.

conf.d

Your custom plugin configuration file (myplugin.yaml) should be placed in your conf.d folder. On linux installs this is at /etc/sd-agent/conf.d/

Virtual Environment

The sd-agent makes use of a virtual environment for it's python dependencies. If you need to install extra dependencies in the virtual environment you can use pip which is available at /usr/share/python/sd-agent/bin/pip.

A Simple Example 

For a simple example we will define a plugin that simply sends a static value for a metric back to Server Density on each plugin execution. Remember that the plugin file name and configuration file name needs to match for the agent to execute your plugin. Lets start with a simple configuration that doesn't include any configuration information:

conf.d/example.yaml

init_config:

instances:
    [{}] 

For the actual check we need to be sure to inherit from AgentCheck in which we will define my.metric and set its value to 1. As we are calling this plugin example we will prepend example to the metric name so that we know this is a metric from the example plugin. This gives a full metric name of example.my.metric

checks.d/example.py

from checks import AgentCheck


class ExampleCheck(AgentCheck):
    def check(self, instance):
        self.gauge('example.my.metric', 1)

A Complex Example

For a more complex example we will define a plugin that attempts to open a socket to a server and port. In the configuration we'll set a global timeout value that will apply to all instances using init_config. We'll also add the ability to define optional configurations with default fallbacks if nothing is defined on a per instance basis and a mandatory configuration value.

In this example the only required configuration value is server. So the minimum viable configuration for two instances would be:

conf.d/portmon.yaml

init_config:

instances:
  - server: example.com
  - server: test.com

This will cause the timeout to be defined as 5s, the port as 80 and the tags set to server: example.com:80 and server: test.com:80, respectively for each instance.

The plugin will return two metrics for each instance; the response time (as portmon.response.time), and a status integer (as portmon.response.status). This will allow us to graph and alert on these metrics in our Server Density account with ease.

However, we can also define more complex configurations. For example the configuration below will set the global timeout to 10. However the 8.8.8.8 instance timeout will be overridden by the timeout configuration in the instance. The 8.8.8.8 instance of the check will also connect on port 53 and append the tags 'dns' and 'google' to the metrics, along with the 'server: 8.8.8.8:53' tag. The example.com instance will keep the global timeout configuration, default to port 80 when connecting and only append the 'server: example.com' tag

conf.d/portmon.yaml

init_config:
  timeout: 10
instances:
  - server: example.com
  - server: 8.8.8.8
    port: 53 
    timeout: 100
    tags: 
      - dns 
      - google

checks.d/portmon.py 

import time
import socket

from checks import AgentCheck


class PortMon(AgentCheck):
    def check(self, instance):
        # Load default_timeout value from the init_config, if not present default to 5
        default_timeout = self.init_config.get('default_timeout', 5)
        # Load port value from the instance config
        port = instance.get('port', 80)
        # Attempt to load the timeout from the instance config. If not present fallback to default_timeout
        timeout = float(instance.get('timeout', default_timeout))
        # If we don't find a server for this instance stop the check now
        if 'server' not in instance:
            # Output to the info log that we're skipping this instance due to no server being configured
            self.log.info("Skipping instance, no server found.")
            return
        server = instance['server']
        # Attempt to load the tags from the instance config. If not present fallback to an empty list
        tags = instance.get('tags', [])
        # Append the tag 'server: server:port' to the tags list, based on the values loaded from the instance config. 
        tags.append("server: {}:{}".format(server,port))
        # A handy debug line in case we need to output information for troubleshooting
        self.log.debug("Timeout set to {} for {}:{} with tags: {}".format(timeout, server, port, tags))

        # Begin the check by creating a socket
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        # Set the timeout on the socket to the configured timeout
        s.settimeout(timeout)
        # Get the current time so we can calculate the response time
        t_init = time.time()
        # Attempt the following unless an error is seen
        try:
            # Set status to 1, so we can report back a simple status metric 
            status = 1
            # Attempt to connect to a remote socket at server, port
            s.connect((server, port))
            # Measure the response time from the timestamp we took earlier in the check
            response_time = time.time() - t_init
            # Close the socket
            s.close()
        # If we see a socket error or a socket timeout
        except (socket.error, socket.timeout):
            # As this is an error condition we'll set the response time to '-1' 
            # so that it's obvious the connection failed when viewing graphs
            response_time =  -1
            # We'll also set the status to 0 as this is an error 
            status = 0 
        # Set the portmon.response.time metric, along with the tags we set earlier
        self.gauge('portmon.response.time', response_time, tags=tags)
        # Set the portmon.response.status metric, along with the tags we set earlier
        self.gauge('portmon.response.status', status, tags=tags)
        # The check is complete. 
        # Once all instances have completed checks the results will be sent to Server Density!

if __name__ == '__main__':
    # Load the check and instance configurations
    check, instances = PortMon.from_yaml('/etc/sd-agent/conf.d/portmon.yaml')
    for instance in instances:
        print "\nRunning the check against host: {}:{}".format(instance['server'],instance.get('port', 80))
        check.check(instance)
        print 'Metrics: {}'.format(check.get_metrics())

Troubleshooting

You can execute your check within the context of the agent by executing the check subcommand:

/usr/share/python/sd-agent/agent.py check {checkname}

where {checkname} is the name of the plugin you have created, assuming the plugin is located in the checks.d folder as the agent expects. With the above example, with a plugin named portmon, you would need to execute the following:

/usr/share/python/sd-agent/agent.py check portmon

If you continue to have issues please send an email to hello@serverdensity.com with a copy of the check code, example configuration and any relevant logs.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.