Custom Plugins - v2

If we don't have a native plugin available for what you need to monitor you can create a custom plugin to gather the metrics and post them back into your Server Density account allowing you to graph and alert on your own custom metrics. 

Using Windows or agent v1? Read the docs.

Interface

All custom plugins must inherit the AgentCheck class found in checks/__init__.py and each plugin requires a check() method that takes a single argument of instance. instance is a dict which holds the configuration of a particular instance. The check method will be run once per each instance defined in the check config.

Metrics

Collecting metrics in your plugin is easy. Within the AgentCheck class you have the following methods available to you:

self.gauge(metric, value, tags) # Collect a gauge metric
self.increment(metric, value, tags) # Increment a counter metric
self.decrement(metric, value, tags) # Decrement a counter metric
self.histogram(metric, value, tags) # Collect a histogram metric
self.rate(metric, value, tags) # Collect a point, with the rate calculated at the end of the check
self.count(metric, value, tags) # Collect a raw count metric
self.monotonic_count(metric, value, tags) # Collect an increasing counter metric 

Each of these methods can take the following arguments: 

  • metric - The name of the metric
  • value - The value of the metric
  • tags - A list of tags to be associated with the metric (optional) 

You can call these methods from anywhere in your plugin logic and once the check is completed any metrics that were collected will be sent to Server Density in the next payload. 

Exceptions 

Meaningful exceptions should be raised if the check is unable to complete due to any reason, such as incorrect configuration, a programming error, or an inability to collect metrics. The exceptions are logged and will be shown in the output of sd-agent info command to allow for easy debugging.

Logging

The AgentCheck class gives you access to a logger at self.log allowing you to output to the sd-agent collector log. You can use this to output helpful debug and info messages from your check to allow for easy debugging. The log handler will inherit the name of your plugin in the form of checks.{plugin_name}, where {plugin_name} is the name of your plugin. To output a debug message you can do something similar to 

self.log.debug('Helpful debug message')

or to output an info message you can do something like

self.log.info('Check completed successfully!')

Plugin Configuration

Each plugin needs a configuration file that should be placed in the sd-agent conf.d directory. Configuration files should be formatted in YAML and the configuration file name should match the plugin name (IE, customplugin.py and customplugin.yaml). Configuration files have the following structure: 

init_config:
    min_collection_interval: 120
    key_1: val_1
    key_2: val_2

instances:
    - username: jon_doe
      password: abcd

    - username: jane_doe
      password: wxyz 

min_collection_interval can be defined in the init_config section to specify how often the check should run. If this is not specified it will default to 0 which will run the check on every collector run (every 60s). If the value is less than 60 this willl cause the check to be run on every collector run. If the value is greater than 60 the collector will check to see if the specified min_collection_interval has elapsed and if it has then the check will run, else the collector will output a message to logs to state that the check has been skipped.

init_config

The init_config section allows you to set global configuration options for the check. These global configuration options will be available to the check on every run.

instances

The instance section is a list of instances that the check will be run against. Your check() method will run once per instance meaning that your custom plugin can support multiple instances simply by adding extra configuration. 

sd-agent Plugin Directories 

checks.d

Your custom plugin code (myplugin.py) should be placed in your checks.d folder. On linux installs this is at /usr/share/python/sd-agent/checks.d

conf.d

Your custom plugin configuration file (myplugin.yaml) should be placed in your conf.d folder. On linux installs this is at /etc/sd-agent/conf.d/

Virtual Environment

The sd-agent makes use of a virtual environment for it's python dependencies. If you need to install extra dependencies in the virtual environement you can use pip which is available at /usr/share/python/sd-agent/bin/pip.

A Simple Example 

For a simple example we will define a plugin that simply sends a static value for a metric back to Server Density on each plugin execution. Remember that the plugin file name and configuration file name needs to match for the agent to execute your plugin. Lets start with a simple configuration that doesn't include any configuration information: 

conf.d/example.yaml

init_config:

instances:
    [{}] 

For the actual check we need to be sure to inherit from AgentCheck in which we will define my.metric and set its value to 1. As we are calling this plugin example we will prepend example to the metric name so that we know this is a metric from the example plugin. This gives a full metric name of example.my.metric

checks.d/example.py

from checks import AgentCheck


class ExampleCheck(AgentCheck):
    def check(self, instance):
        self.gauge('example.my.metric', 1)

 

A Complex Example

For a more complex example we will define a plugin that attempts to open a socket to a server and port. In the configuration we'll set a global timeout value that will apply to all instances using init_config. We'll also add the ability to define optional configurations with default fallbacks if nothing is defined on a per instance basis and a mandatory configuration value.  

In this example the only required configuration value is server. So the minimum viable configuration for two instances would be: 

conf.d/portmon.yaml

init_config:

instances:
  - server: example.com
  - server: test.com

This will cause the timeout to be defined as 5s, the port as 80 and the tags set to server: example.com:80 and server: test.com:80, respecitvely for each instance.

The plugin will return two metrics for each instance; the response time (as portmon.response.time), and a status integer (as portmon.response.status). This will allow us to graph and alert on these metrics in our Server Density account with ease. 

However, we can also define more complex configurations. For example the configuration below will set the global timeout to 10. However the 8.8.8.8 instance timeout will be overridden by the timeout configuration in the instance. The 8.8.8.8 instance of the check will also connect on port 53 and append the tags 'dns' and 'google' to the metrics, along with the 'server: 8.8.8.8:53' tag. The example.com instance will keep the global timeout configuration, default to port 80 when connecting and only append the 'server: example.com' tag

conf.d/portmon.yaml

init_config:
  timeout: 10
instances:
  - server: example.com
  - server: 8.8.8.8
    port: 53 
    timeout: 100
    tags: 
      - dns 
      - google

checks.d/portmon.py 

import time
import socket

from checks import AgentCheck


class PortMon(AgentCheck):
    def check(self, instance):
        # Load default_timeout value from the init_config, if not present default to 5
        default_timeout = self.init_config.get('default_timeout', 5)
        # Load port value from the instance config
        port = instance.get('port', 80)
        # Attempt to load the timeout from the instance config. If not present fallback to default_timeout
        timeout = float(instance.get('timeout', default_timeout))
        # If we don't find a server for this instance stop the check now
        if 'server' not in instance:
            # Output to the info log that we're skipping this instance due to no server being configured
            self.log.info("Skipping instance, no server found.")
            return
        server = instance['server']
        # Attempt to load the tags from the instance config. If not present fallback to an empty list
        tags = instance.get('tags', [])
        # Append the tag 'server: server:port' to the tags list, based on the values loaded from the instance config. 
        tags.append("server: {}:{}".format(server,port))
        # A handy debug line in case we need to output information for troubleshooting
        self.log.debug("Timeout set to {} for {}:{} with tags: {}".format(timeout, server, port, tags))

        # Begin the check by creating a socket
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        # Set the timeout on the socket to the configured timeout
        s.settimeout(timeout)
        # Get the current time so we can calculate the response time
        t_init = time.time()
        # Attempt the following unless an error is seen
        try:
            # Set status to 1, so we can report back a simple status metric 
            status = 1
            # Attempt to connect to a remote socket at server, port
            s.connect((server, port))
            # Measure the response time from the timestamp we took earlier in the check
            response_time = time.time() - t_init
            # Close the socket
            s.close()
        # If we see a socket error or a socket timeout
        except (socket.error, socket.timeout):
            # As this is an error condition we'll set the response time to '-1' 
            # so that it's obvious the connection failed when viewing graphs
            response_time =  -1
            # We'll also set the status to 0 as this is an error 
            status = 0 
        # Set the portmon.response.time metric, along with the tags we set earlier
        self.gauge('portmon.reponse.time', response_time, tags=tags)
        # Set the portmon.response.status metric, along with the tags we set earlier
        self.gauge('portmon.reponse.status', status, tags=tags)
        # The check is complete. 
        # Once all instances have completed checks the results will be sent to Server Density!

if __name__ == '__main__':
    # Load the check and instance configurations
    check, instances = PortMon.from_yaml('/etc/sd-agent/conf.d/portmon.yaml')
    for instance in instances:
        print "\nRunning the check against host: {}:{}".format(instance['server'],instance.get('port', 80))
        check.check(instance)
        print 'Metrics: {}'.format(check.get_metrics())
 
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Monday  —  Friday.

10am  —  6pm UK.

Dedicated Support.