Splunk Search Deep Dive

A Collection of Materials

Intro

Welcome

I've written a lot of search in my time, and I regularly leverage a fairly deep understand of how Splunk stores and searches data to enable me to build faster and better searches. I've spent a lot of time verbally teaching these things to other folks, and recently was asked to build out an overview of how these capabilities work. That gave me the idea that we should have one consolidated place where someone can go from a normal Splunk user to an expert on Splunk's search design and process -- here that place is. Why is this on my website? Well, we'll probably make a blog post eventually, but as you read this it's incomplete. Maybe some day you'll come back and it will just be a pointer to a Splunk blog!

What's This Content From?

While there are a few pieces of content that are just being built now, Splunk has put out a lot of content over the years on how search actually works, and how to leverage that effectively for your needs. Most of the content below is from prior conf talks, from Splunk's Principal Sales Engineers and Software Developers.

How Much Do I Need to Know?

I've broken out the content into Level 1, Level 2, and Level 3, so that you can judge how deep you want to, or need to go.

  • Level 1: Approximately equivalent to Advanced Searching and Reporting in Splunk. If you enjoyed that EDU class (or are saving your dollars for it), then you should go through this content.
  • Level 2: Provides a deep understanding that will allow you to be one of the most advanced searchers, and make more efficient searches. If you spend a lot of time building Splunk searches that are re-used by other people, or used frequently (in correlation searches and the like), you'll get a lot of value out of this content.
  • Level 3: The super technical content that can help you get every last drop of value out of Splunk, but is maybe not required for even most avid Splunk users. For the folks that really want to understand it all.


Level One


Overview of Splunk Search for Technical Folks

Covers
  • The essentials of how Splunk writes data to disk, and what that means for search.
  • What Schema on the Fly really means in practice, and various implications.
  • How data is stored in Accelerated Data Models.
  • Alternative Data Storage Mechanisms in Splunk (lookups, kvstore, metric store).
  • A quick coverage of the general differentiation of Splunk.
Links

Public version not yet recorded. Check back for details, or get announcements by following @davidveuve.


Searching FAST: How to Start Using Splunk Acceleration Techniques

Covers
  • A progression through the search acceleration capabilities over time, from a user perspective.
  • Light coverage of Summary Indexing, Report Acceleration.
  • Heavy coverage of how to start using tstats on accelerated data models.

Links

David recommends watching the video first, then progressing to read the PDF copy and the flowchart.


Lesser Known Search Commands

Covers
  • Several search commands and SPL tricks that will help save you time.
    • rest
    • makeresults
    • gentimes
    • metasearch
    • metadata
    • union
    • map
    • foreach
    • untable
    • contingency
    • xyseries
    • eventstats
    • streamstats
    • tstats
    • mstats
    • autoregress
    • stats+eval
    • eval indirect field reference
    • subsearch: query / search
    • CLI Commands
  • Other Resources
Links

Security Ninjtusu Part Four: Intermediate Techniques

Covers

Security Ninjutsu Part Four contains many different techniques at different levels. The techniques recommended for Level 1 fall under the "Intermediate Techniques" header in the slide deck, and include the following.

  • Common Information Model
  • Advanced eval
  • Multi-Value Fields
  • Stats on Stats
  • Formatting a Table
  • Multi-Scenario Alerts
  • Inline Comments
  • Tuning Techniques
  • Stats + Eval
  • Overriding ES Urgency / Severity / Risk
  • Common Apps
  • Risk
  • Subsearches
Links

Level Two


Revealing the Magic: The Life Cycle of a Splunk Search

Covers
  • At an engineering level, how does Splunk actually function.
  • Index folder structure, bucket structure.
  • Bloom Filters
  • Lispy
  • Unindexed Fields vs Indexed Fields
  • walklex
  • Leading vs Trailing Wildcards
  • Transactions vs Stats
Links

Deep Dive into Data Model Acceleration

Covers
  • A short review on how normal search works.
  • How that paradigm was leveraged for Data Model Acceleration.
  • Benefits and speed.
Links

What's New in Splunk Search 2018

Covers
  • Schema Accelerated Event Search -- enables raw event search (particularly drilldown) at data model acceleration speeds
  • Parallel Reduce -- enables high cardinality analytics
Links

Public version not yet recorded. Check back for details, or get announcements by following @davidveuve.


Observations and Recommendations on Splunk Performance

Covers
  • Indexing Pipeline -- how to improve
  • Search Pipeline
  • Search Types
  • Improving Performance of System
Links

Security Ninjtusu Part Four: Advanced Techniques

Covers

Security Ninjutsu Part Four contains many different techniques at different levels. The techniques recommended for Level 2 fall under the "Advanced Techniques" header in the slide deck, and include the following.

  • Summary Indexing
  • Lookup Caching
  • Confidence Checking
  • Managing Alert Fatigue
  • Transaction
  • First Time Seen Detection
  • Time Series Detection
  • Time Series * First Time Seen Detection
Links

Level Three


How splunkd works

Covers
  • splunkd
    • Pipelines
    • Processors
    • Queues
  • Inputs
    • File
    • Network
    • Scripts
    • HEC
    • S2S
    • And more!
  • Debugging
    • Metrics
    • Monitoring Console
Links

Security Ninjtusu Part Four: NINJA Techniques

Covers

Security Ninjutsu Part Four contains many different techniques at different levels. The techniques recommended for Level 3 fall under the "NINJA Techniques" header in the slide deck, and include the following.

  • tstats
  • Timestamps and Timestamps
  • Advanced Search Commands
  • metacharacteristics
  • Machine Learning - Numeric Time Series Clustering
  • General Approach to Analytics
Links