Any-Metric Graphing with Graphite and Syslog

I’ve developed a solution to the idea posed the other day. It wasn’t hard, but here’s an official write-up of the effort for what it’s worth.

Who did what now?

Apparently first showing its face as part of Eric Allman’s sendmail, syslog is available from the ubiquitous logger(1) UNIX/Linux command to the Log4j Java library and everywhere in-between. Syslog provides simplistic fire-and-forget UDP[1] messages with a timestamp.

We’ve had a centralized syslog setup for a good 13 years now. Why not use syslog over UDP to deliver metrics due to its nearly zero barrier to entry for developers, systems, and sysadmins?

We’ve been very passively using Ganglia for what amounts to our largely “get something going” metric gathering effort. Having heard about Graphite (technically the home is here but… ew) many times now and having played with it briefly a few months ago, I decided to see what all the hubbub was about.

After about 2 hours of fussing around and reading[2], I had Graphite up and functioning and… there it was. Oh, right. I have to somehow feed it data.

There’s this StatsD thing I keep hearing in the same breath as Graphite. Turns out it is for (drum roll) allowing special UDP fire-and-forget metric packets to be sent to it where the daemon aggregates data before sending calculated statistics off to Graphite’s carbon backend. We don’t have a use for that right now and it’s yet another thing to run and maintain, so I’m passing it over for the time-being.

Right. I like using, when possible, what is widely available instead of distributing new package or library dependencies to all nodes in our infrastructure. And so, the goal is: Figure out how to get arbitrary metrics (zero configuration per new metric desired) sent to a central location via syslog and get graphs.

Enter Logster

It became clear from the generous replies to my request for comments that, at least for this proof of concept effort, Logster was going to play a role.

Logster is typically instantiated via cron with command-line options indicating a “parser” to use and a log file to watch. It determines (via its logtail dependency) what was last seen in the log file from the previous cron run, and feeds all new lines through the specified parser. The parser (essentially a custom plugin you write in Python) is responsible for returning instances of MetricObject. If you’re curious about the parsers right now, have a look at one of the samples from the source: parsers/SampleLogster.py

syslogmaster% sudo yum install -y logcheck
syslogmaster% cd /tmp
syslogmaster% sudo wget -q --no-check-certificate -O logster.tar.gz https://github.com/etsy/logster/tarball/master
syslogmaster% sudo tar xzf logster.tar.gz
syslogmaster% cd logster*
syslogmaster% vi logster # look around for crap to change and change it
syslogmaster% make install
syslogmaster% cd parsers && echo "Now write a parser..."

On our syslog master where this was installed, I cut my teeth on a simple parser for our Cyrus IMAP server, copied it to /usr/share/logster where it could be picked up properly, and made a per-minute cron job:

0-59 * * * * /usr/sbin/logster -o graphite --graphite-host=rcf-metrics:2003 CyrusIMAPSyslog /speaker/log/auth.log

Rad. A graph. I think this server is a little underutilized:

The MetricLogster Parser

Since I had Logster successfully feeding working data into Graphite’s mouth, I set out to the real task at hand and crafted my “MetricLogster” parser, which can be found in my fork of the Logster repo on github (etsy/logster pull request 12 has been filed).

Interlude: Syslog snag!

If you syslog frequently enough, you get the dreaded last message repeated N times lines in your log file and not actual data. I worked around this by switching our syslog master server from RHEL 5’s syslogd to a hand-build of rsyslog (which comes as default in RHEL 6). The rsyslog package has a $RepeatedMsgReduction global directive which one can set to off to always log data and never log those annoying messages.

Back to The MetricLogster Parser

Let’s test this thing out.

Generate syslog data with random numbers from bash’s builtin $RANDOM:

client% while :; do \
logger -t whatever -p local1.info metric=test.bash.random value=$RANDOM; \
sleep 2; \
done

On the syslog master server, instead of waiting for cron every minute (since we’re testing), run Logster every 3 seconds and feed our metric data into Graphite/carbon:

syslogmaster% while :; do \
sudo /usr/sbin/logster -o graphite --graphite-host=rcf-metrics:2003 \
                       MetricLogster /speaker/log/local1.log ; \
sleep 3; \
done 

Generates this kooky but successful graph:

Notes and Ideas

  • rsyslog supports input and output modules. If there was an output module for rsyslog that wrote directly to Graphite (technically carbon-cache or carbon-relay), we could remove Logster, its logcheck dependency (for logtail) and the custom Logster parser module (MetricLogster) from the the whole picture.
  • If StatsD accepted well-formed basic syslog data, that would be slick too.
  • I have no idea how well Logster scales.
  • The keys in the metric syslog lines could really be shortened to m and v (from metric and value) to save a 10 bytes per log line.
  • I am an infant in this arena.
  • My statistics knowledge is 1/100th what it needs to be.
  • There needs to be a Graphite and Friends resource hub/wiki
  • Sure, this isn’t a gigantic achievement of mine, but words here are free and I’d be remiss to not say: Thanks to the Etsy and Orbitz engineers who have made Logster and Graphite freely available, the rsyslog developers, all of the generous people who offered comments to my previous post on this topic, and all of the noisy people I follow on Twitter who share info and/or provide me with ideas or answers weekly.

Footnotes

  1. Unless TCP is specifically configured as the transport where supported. Use whatever suits your needs – the information above still applies.
  2. If the graphite-dev team at Launchpad would be so kind as to let me join the club, I would happily submit patches to a lot of things I found wrong, even if I do have to learn stupid Bazaar. It’s been a week now or so since my request to join. Do you guys want help or not? ;)