Any Metric Graphing with cron, some code, syslog, and Splunk

Sometimes (rarely), I get what I consider to be a clever idea.

Today, while tying up an evaluation of Zenoss Core it occurred to me that one could get nice system performance graphs by simply syslogging the performance data to Splunk, which provides for time series-based graphing.

Monitoring agent, schmonitoring schmagent. We’ve got syslog, cron, and bash/perl/python/ruby on the system already, and we’re syslogging to Splunk already.

The selling feature here is that you can turn any metric’s data into a chart, and that data can be anything you can gather from a UNIX shell (in our case, spawned by cron).

As a proof of concept, I spent 5 minutes and whipped up the following test script which runs out of cron repeatedly (choose your own interval).

#!/bin/sh

VMSTAT=`vmstat 1 2 | tail -1 | awk '{print "runqueue=" $1 " scanrate=" $12 " blockedprocs=" $2}'`
LOAD1=`uptime | sed 's/.*load average: \(.*\), .*, .*/load1=\1/g'`
logger -t stats -p user.info $VMSTAT $LOAD1

This syslogs a line like the following one at the chosen cron interval:

Dec  1 00:24:19 ourhost stats: [ID 702911 user.info] runqueue=1 scanrate=0 blockedprocs=0 load1=0.78

Now, since you’re syslogging all of your host data to Splunk (you are, right?), it’s just a matter of graphing the data against the event’s timestamp in Splunk.

Our Splunk 4.1.6 search query was as follows for our proof of concept data:

"ourhost stats:" | multikv | table _time load1 runqueue blockedprocs scanrate

Clicking “Show Report”, setting Chart Type to “Area”, Multi-Series Mode to “Split”, and Null Values to “Treat as zero”, we get the following:

We’d love to hear your comments. Jeff Blaine with Splunk search brainstorming assistance from Jeremy Maziarz