Collectd is an excellent tool for collecting performance data as it is lightweight and comes with plugins for many different software components.
However, some of the built-in plugins require external libraries and re-compilation of the source. In the past, we have written Exec plugins to handle both special cases and to collect more detailed statistics in mostly Ruby. Unfortunately, this meant either developing scripts with no dependencies that need to be managed (eg. no Redis gem for the ruby script) or managing even more software on every box.
The solution to this was obvious given the emergence of the Go language. Besides many of the interesting features of the language it is possible to compile both the collectd script and all of the necessary libraries into a single binary that is extremely portable.
Collectd exec plugins need to print a formatted string to the standard out to be picked up by the collectd daemon
# hostname /application /type-stat_name epoch_time_stamp:value
PUTVAL tamarin.shokunin.co/redis-port6390/gauge-used_memory 1423939568:1214072
This will turn into the following graphite statistic that can be viewed later
tamarin_shokunin_co.redis-port6390.gauge-used-memory 1214072
While graphite can handle much longer names, the exec plugins should only go this deep into the naming scheme, or they will not be recognized by the collectd master process.
The following code can be found on Github
Let's break down the example code:
package main
import "time"
import "fmt"
import "flag"
import "bufio"
import "os"
import "strconv"
The bufio and the strconv are the interesting imports here. bufio is used to handle buffered I/O since we rely on the output to stdout, we need to ensure that output is not buffered as most programming languages will do. strconv is used to convert integers to strings since Go is very strictly typed.
func init() {
flag.BoolVar(&debug, "debug", false, "turn on debugging")
flag.IntVar(&sleepTime, "sleep-time", 10, "Number of seconds between runs")
flag.StringVar(&host, "host to connect to", "localhost", "Defaults to localhost")
flag.Parse()
}
Go has an excellent library flag for handling command line arguments and we basically initialize some variables with some default values and running -h will provide formatted usage data for the end user
func collectd(unixTs int, hostname string) {
var hostlabel string
if hostname == "localhost" {
hostlabel, _ = os.Hostname()
} else {
hostlabel = hostname
}
f := bufio.NewWriter(os.Stdout)
defer f.Flush()
b := "PUTVAL " + hostlabel + "/" + "bar" + "/" + "gauge-name " + strconv.Itoa(unixTs) + ":" + "value\n"
b += "PUTVAL " + hostlabel + "/" + "bar" + "/" + "counter-name " + strconv.Itoa(unixTs) + ":" + "value2\n"
f.Write([]byte(b))
}
Since most of the rest of the code for all of our Go collectd plugins remains the same we create a collectd function that actually collects the data and prints it in a single function.
The general usage pattern is to connect to the localhost, but since that is not always the case we can override this using the hostlabel variable.
Setting up a new buffered writer to standard out and defering the flush will ensure that the output is sent to stdout when the
The main two types of data that we send are counters and gauges. If you configure collectd to StoreRates then in the case of a counter only the delta would be sent downstream. This is the type that you would use for an ever increasing number, such as the number of client requests sent to Redis. Gauges are useful for point in time metrics, such as the current number of Nginx connections in the read state.
func main() {
ticker := time.NewTicker(time.Second * time.Duration(sleepTime))
// run once at the beginning
collectd(int(time.Now().Unix()), host)
go func() {
for t := range ticker.C {
if debug {
fmt.Println("DEBUG", time.Now(), " - ", t)
}
collectd(int(time.Now().Unix()), host)
}
}()
// run for a year - as collectd will restart it
time.Sleep(time.Second * 86400 * 365)
ticker.Stop()
fmt.Println("Ticker stopped")
}
We set the sleep time to match the interval settings most times in collectd. However, we might sleep for an hour when collecting the Postgres database table sizes or other statistics where we do not require a super high granularity.
The exec plugins will get restarted by the main collectd plugin several times before it gives up so the main runs for a year
Collectd allows for an include directory, so we generally drop in a file for each service into the /etc/collectd.d directory
#####################################################################
# Puppet Controlled
#####################################################################
LoadPlugin exec
<Plugin "exec">
Exec "nobody" "/opt/collectd/custom_plugins/pg_table_size" "-port" "5432" "-interval" "3600" "-database" "app_prod" "-password" "secret"
</Plugin>
Best practice is to run as nobody whenever possible and each argument needs to be quoted.
Writing collectd plugins in Go is fairly easy and the scripts are portable enough to make it worth taking the time to learn a new programming language. More and more DevOps tools are being written in Go for a variety of reasons, but the portability of the binaries alone make it a good candidate for operations teams going forward.