Timechart Advanced Statistics

Posted by David Veuve - 2011-07-27 11:28:58
In the last post (Overview of Timechart Syntax in Splunk), we covered the basics of Splunk's timechart Syntax. This post will cover some of the more advanced statistical functions of timechart.

The statistical functions are really the core of timechart (along with the core of Stats -- see Timechart Versus Stats). Check out the full list of statistical functions you can use with timechart here, and then you should probably either bookmark it or get used to googling "Splunk timechart" repeatedly. Generally, the best way to get comfortable is to have problems to solve, but browsing the documentation can help you get a feel for what's possible.

Here are a few notable options:

Max, Min, Average, etc.

These aren't really advanced, but you've got to start somewhere:

index=_internal source=*metrics.log splunk_server="*" group="per_index_thruput" earliest=-7d@d latest=@d

     | eval MB=kb/1024

     | timechart span=1d sum(MB) as MBPerDay

     | timechart span=7d avg(MBPerDay) as AvgMBPerDay, min(MBPerDay) as MinMBPerDay, max(MBPerDay) as MaxMBPerDay

(Always remember to rename your fields for the most usability!)

Count versus distinct count index=_internal source=*metrics.log splunk_server="*" group="per_index_thruput" earliest=-7d@d latest=@d

     | eval MB=kb/1024

     | timechart span=1d count(series), dc(series)

Here, count will be the number of events that include a series, and the distinct count (dc) will keep track of each series and tell you how many there are in total (akin to mysql distinct).

Standard Deviation index=_internal source=*metrics.log splunk_server="*" group="per_index_thruput" earliest=-7d@d latest=@d

     | eval MB=kb/1024

     | timechart span=1d sum(MB) as TotalIndexed

     | stats avg(TotalIndexed) as Average, stdev(TotalIndexed) as StDev, first(TotalIndexed) as MostRecent

     | eval Alert=if(MostRecent>(Average+StDev),"Indexing Up!",if(MostRecent<(Average-StDev),"Indexing Down!","All Is Well"))

Technically, this isn't really a good usage of timechart -- it's more a usage of stats. But they're related, and it's a good usage of stdev. This search will look at your typical indexing load over the last 7 days, sum it up per day, and then look at the most recent day. If the most recent day is more or less than one standard deviation away from the average, it will alert you.

(Statistical aside: I should point out that the one way in which this isn't a good use of stdev is that indexing amount probably doesn't follow a normal distribution -- you're likely going to have significant dips on the weekend, or what have you. Personally, I don't use stdev with that level of rigor [leave deep statistical analysis to the PhDs], and just use it as a ballpark indicator of the typical amount of variance. A typical usage for me might be looking at how much a particular company spends per day, and then seeing if they just dropped below their typical levels.)

Percentile

Perc99 can also be a great indicator of the amount of variability there is in a report. You can use any number you want in place of 99 (e.g., Perc27). People frequently ask to see the Min/Avg/Max, which is not necessarily as telling as the Min/Perc1/Avg/Perc99/Max. For example, when I run the query below, looking at the daily averages for the hourly indexing load, my Average for one day is 524 MB per hour, but the max is 1005 MB per hour. That feels very significant, except that the 99th percentile is 607 MB, which tells us that generally there is very little variability, but there was one huge blip.

index=_internal source=*metrics.log splunk_server="*" group="per_index_thruput" earliest=-7d@d latest=@d

     | eval MB=kb/1024

     | timechart span=1h sum(MB) as MB

     | timechart span=1d perc99(MB) as MB99thPercent max(MB) as MBMax avg(MB) as MBAvg stdev(MB) as MBStDev

This is just a couple of examples -- there's a lot of depth in these functions. This is the real power of timechart. To really get used to timechart, you should spend some time looking over the available functions.

Ready for more? Check out: