Part 1 – Splunk Streaming Data Pre-Processing Examples – Search & Replace

Search & Replace

Welcome to Part 1 of our examples series on Streaming Data Pre-Processing  with Splunk.

In this  example we are going to stream some text data events to Splunk , dynamically apply a regex filter to the raw incoming events and replace any text that matches the regex pattern with either a character sequence or a hash.

  1. Download and install the Protocol Data Inputs App.
  2. Restart Splunk.
  3. Log in to Splunk and browse to
    Settings -> Data Inputs -> Protocol Data Inputs -> +Add New
  4. We are going to configure a new Protocol Input to listen for events via TCP on port 12358 (you can choose any port , I just like Fibonacci sequences)
  5. In order to perform the custom pre-processing , we need to declare a custom data handler to be applied to the incoming data. Conveniently I have bundled a Search/Replace data handler with the latest release of the Protocol Data Inputs App.Source Code is here.
  6. This is a generic Search/Replace handler that allows you to specify any regex to be matched , and then the matched groups will be replaced with either a character sequence or a hash. All of this is configurable via a JSON config string that can be declared in the config as shown above. A full config string example is as follows

    Character replacement

    Hash replacement

  7. Specify your sourcetype and index for the processed data
  8. There are several other performance tuning parameters available , but for this simple example we do not need to worry about them. Save your new config and do a quick check that your TCP port has been opened successfully
    netstat -anlp | grep 12358
  9. Now we are ready to stream some events over TCP to Splunk. You can very simply test this by using the netcat(nc) program

    echo -n "Hello abc def" | nc localhost 12358

  10. As you can see , the raw text event "Hello abc def" has been regex filtered and pre-processed into "Hello *** def"
  11. We could also change this to replace the matched pattern with a hash

Of course these are just trivial examples to make it more understandable and hopefully get you thinking about creating your own custom data handlers for any pre-processing use cases you have.In reality your Regex pattern would be more complex in order to search/replace fields such as

  • Credit Card Numbers
  • ID Numbers
  • Authentication Credentials
  • Any sensitive Personal Indentifiable Information (PII)
  • etc..

Thanks for reading and stay tuned for Part 2 next week where we will demonstrate streaming binary data to Splunk in the form of Protobuf messages.

Building a streaming data preprocessing tier with Splunk

Several years ago I wrote a Splunk App called Protocol Data Inputs (PDI).

The purpose of this App is to provision an “on premise” data streaming layer on top of Splunk to receive any kind of data (binary or text) by way of several different protocols(TCP(s), UDP, HTTP(s), Websockets, SockJS).

And then provide an architecture that allows you to dynamically plug in and configure “custom data handlers” to perform any kind of data pre-processing that you require before forwarding the processed data on to Splunk indexes.

This was all built from the ground up with a non blocking , asynchronous, event driven architecture that is designed for speed and scale , implemented as a 100% native Splunk App that you simply install and deploy just like any other App into your Splunk Infrastructure.

Here is a link to the original blog with more details.

Fast forward to today and this App has now been installed by numerous customers globally in their production environments , running robustly 24/7 at scale with some really exciting data use cases.

Pre-Processing Use Cases

Some of the data pre-processing use cases we have seen implemented with the PDI App include :

  • Indexing binary data … compressed data , encrypted data, custom industry protocols(aviation, finance/banking) , images
  • Event aggregation/grouping, statistical pre-computations
  • Mask , encrypt , obfuscate fields in events
  • Dilute verbose events , keep the data you want and discard the data you don’t need to index
  • Reformat events on the fly … XML -> JSON , logs -> metrics
  • Re-route, copy or drop data
  • Wrap another data processing engine inside a custom data handler such as Apache Flink.
  • Dynamically add a checksum to your data
  • Perform video and image analysis

In the example below , images(binary data) were uploaded directly to Splunk with the PDI App listening for HTTP Posts , the image was then pre-processed with a PDI data handler using AWS Rekognition and the image stored in AWS S3. The image analysis meta data returned from AWS Rekognition was then indexed in Splunk along with a link to the image’s location in AWS S3. You can then search for all images that are of a “beer glass” and render the results and image in a Splunk dashboard !


So what would a typical Splunk architecture look like using the PDI App deployed on Splunk Forwarders to create a streaming data pre-processing tier ?


Here is a simple example to get you started that demonstrates streaming binary content (some gzip compressed text) to Splunk , pre-processing the payload (decompressing the binary data) , and forwarding the processed results (text content) to be indexed in Splunk.

Over the course of the next year we will regularly update this blog series to show you even more examples of custom data pre-processing use cases.

Requests are more than welcome also , just drop us a line.

Loading Custom Vendor MIBs into the SNMP Modular Input

The SNMP Modular Input ships by default with all the core SNMP MIBs.

However if you want to use Custom Vendor MIBs then you have to convert them into Python modules yourself.

Don’t worry , it is actually pretty easy if you follow these steps accurately.

  1. Locate your custom vendor MIB file(s) , for this example lets say it is called FOOBAR-MIB.txt
  2. On a Nix variant operating system we are going to run a command to convert the MIB file into a Python module
  3. Install the smidump command , for example on Ubuntu Linux it might be done with : sudo apt-get install smitools
  4. Install the libsmi2pysnmp script , you can just grab it from Github
  5. The smidump command can be configured in /etc/smi.conf
  6. In this file you will see a section for paths to MIB files
  7. Put your FOOBAR-MIB.txt file in one of these path directorys
  8. MIB files also have dependencies to resolve, you can view these at the top of the MIB file in the imports section
  9. A quick shortcut to dealing with dependencies is to just grab all these MIB files here
  10. And then put them in one of your /etc/smi.conf path directories also
  11. Now you are ready to run the command
    1. smidump -f python FOOBAR-MIB | libsmi2pysnmp >
  12. Take the resulting python module ( and copy it to $SPLUNK_HOME/etc/apps/snmp_ta/bin/mibs
  13. There is no need to bundle up your Python modules into an Egg , you can if you want , but it is perfectly fine just to copy the plain Python files
  14. When you configure your SNMP input , simply declare your MIB(s) to be loaded

Splunking SendGrid Statistics

We’ve been playing around with SendGrid quite a bit recently. If you are not familiar with SendGrid , it is an emailing service provider that provides a platform for sending emails, email marketing campaigns and monitoring/tracking the status of your emails and campaigns. And , SendGrid also provides a great developer experience with a rich RESTful API for sending emails and getting access to all of your email statistics. It’s no surprises that SendGrid is part of the Twilio family who have always championed fantastic developer experiences.

SendGrid provides is own dashboards and activity monitor UI , but all of this data is also available via a simple REST call. So we decided to hook into this data source and pull this data into our Splunk instance.

This is how we did it.

Step 1 : Identify the SendGrid REST endpoint that we want to get the email statistics from

This endpoint allows you to retrieve all of your global email statistics between a given date range.

Step 2 : Download the REST API Modular Input App from Splunkbase

Download here :

Step 3 : Configure a new REST stanza to poll the SendGrid Global Stats endpoint

We went with a polling frequency of 3600 seconds (once per hour).

Here is the config via the UI :

And the same config if you are editing inputs.conf directly :

Step 4 : Configure a custom response handler

We need to plug in a custom response handler that :

  • keeps track of the date range and persists the start date back to inputs.conf
  • splits out the polled JSON into individual events

This custom response handler gets added to rest_ta/bin/ and then declared in your configuration.

Step 5 : Write a Splunk Search to search over the JSON results

Step 6 : Chart the search results

Alexa Skill Development Package

Are you ready for your business to join the voice revolution ?

Our founder created the first ever custom Alexa skill for a Big Data platform (open source and freely available) that has been shown at major industry events around the world including on the main stage in the keynote at AWS re:Invent 2017 by the AWS CTO.

Alexa for Business

Alexa for Business  allows you to deploy a natural language voice interface privately across your enterprise and we want you to be able to leverage our proven vision and execution ability to create a production-grade custom Alexa skill for your business or technology platform.

  • Stay ahead of the technology curve with an interactive interface that is going to be as commonplace and ubiquitous as the keyboard and mouse.
  • Generate WOW Factor and A-Ha moments with captivating demos that outshine your competitors.
  • Open up new and exciting uses cases for interacting with your platform, such as wearable Alexa devices in hands-free/eyes-free operating environments.
  • Provide accessibility to your platform where visual and/or pointing based interfaces may not be practical, feasible or safe.
  • Ride the wave of industry momentum with a technology that can amplify your lead generation and marketing exposure.
  • Integrate voice-driven business intelligence answers and commands into your meetings and work environment
  • Provide  a unique and captivating experience for your customers to interact with your business or technology platform
Special Pricing

We are kicking off 2018 by offering a special one-off pricing package to develop and support a custom Alexa skill for your technology platform.

Please contact us to discuss pricing and requirements further.

Alexa for Business, Private Skills and Splunk

I was very excited to hear of the availability of Alexa for Business at AWS re:Invent 2017.

When the freely available and open source Splunk Alexa Integration was created in early 2016, it employed a little hack to make the skill private to the user and not published as a public Alexa Skill.Which is what you need when you are integrating with your own private Splunk environment.

This essentially entails running the Splunk Alexa skill under your own private development account.

Alexa for Business totally addresses this need with private skills.

I had an inkling this feature would be released by the Alexa team at some point, and it is now here.

So thank you Werner “Santa” Vogels and Team!

“… With Alexa for Business you can make these skills available to your shared devices and enrolled users without having to publish them to the Alexa Skills store…..”

“… When your skill is ready you can mark the skill as private, submit the skill, and then distribute it to your Alexa for Business account….”

We are here to help you with any setup and configuration of your Splunk Alexa environment that you require as well as custom development with the developer extension hooks built into the integration(look for “create your own dynamic actions in the docs“) that allow you to essentially achieve anything you can conceive of above and beyond the core functionality of mapping your voice intents to underlying SPL searches.

So please contact us so we can get you talking to Splunk.

Merry Xmas everyone and happy Splunking with your voice.

So you need a Splunk App or Integration built , here are 10 reasons to consider us.

We’re the most widely downloaded and production installed developers in Splunk’s history. Splunk field staff, partners, customers and the community have consumed the solutions we support on Splunkbase for nearly a decade in all verticals. I’ll let this speak for itself, it’s not just talk. We’re proven to have created solutions that are compelling and innovative, that there is a genuine demand for, that deliver real data value and that run reliably and robustly in production environments 24/7.

All work we do is overseen by Splunk’s former Worldwide Developer Evangelist who has years of internal experience in Splunk development, mentoring other customers & partners to build their successful solutions and was part of inventing and blazing the way for much of what is considered standard today in many aspects of Splunk development.
This translates to experience and hands-on attention to your solution that you simply won’t get elsewhere.

We are a 100% privately held company and well capitalized, we won’t leave you in the lurch for support after we have created your solution for you.We’re in this with you for the long haul.

We have offices and development/support staff spread over multiple time zones across Asia and the Pacific.

We are an official Splunk Partner(TAP) in the Partner+ program.

We believe in the tenet that working software is the primary measure of progress, not emails, meetings and powerpoints.Working code. Throughout the development process, we keep the customer in complete visibility of the work in progress with your own dedicated demo/feedback environment, you are never black boxed out of the picture at any stage.

We have several support options available so that once we have delivered your solution you can choose a support package to get dedicated support from the BaboonBones team that best suits your requirements and budget.

Many of the Apps that we support on Splunkbase have formed the core of other successful apps over the years.What a great way to get a running start to creating your solution.Need a JVM based app  ? Then start with the JMX App to bring in your data.Reuse is your friend.

We have world-leading Splunk expertise in many domains and several of our creations have become de facto standards in these domains.Need a JVM based solution, an IoT solution, a Messaging based solution, an integration with an SDK,  we have the proven experience and expertise to guide you.

We love to chart new waters with challenging Splunk integrations that have a real WoW factor.This is what will set you apart and generate A-Ha moments.Anyone can create an Add-On to just pull in some data , we like to take the Splunk platform to new and exciting places.Look at some of this integration work for example.

We walk the talk.And we have the proven pedigree to create a best of breed Splunk solution for you.So please get in touch and let’s start talking about your requirements.

Get it on – Splunk Alexa Integration at Splunk Conf 17

Really great to see the Splunk Alexa integration front and centre at the registration area at Splunk Conf 17 allowing people to talk to data in Splunk , cheers to the team for setting this up !

UPDATE : check out this blog on the Splunk blogs site. 

Skate to where the puck is going to be , how will YOU interact with your data in the future ?

Innovating with AWS and Splunk

Splunk Conf 17 is just around the corner and this year AWS is a Peta level sponsor.

We’ve built several Splunk solutions that integrate with AWS over the years.Check out some of these innovative ways to bring the 2 platforms together for an even greater experience with your data!

AWS Kinesis

We wrote the original App for indexing data in Splunk from Kinesis and it is used extensively to this day by Splunk customers.

What’s cool about this App is that it can handle text AND binary data in the Kinesis streams.  What’s even cooler is that you can plug in your own custom handlers to preprocess the raw received data in the streams any way you want before indexing it in Splunk, powerful stuff!

The App is free to download and use and if you need any support, please reach out to us.

AWS Alexa

We made it possible to talk and listen to Splunk using Alexa, check out this blog here.

The App is free to download and use and if you need any support, please reach out to us.

AWS Rekognition and S3

Upload images into Splunk, store them in S3,  analyze the contents of the image using Rekognition and index all the juicy meta data in Splunk to build out your own imaging analysis use case.

Here is a screenshot of some work we did at Splunk where a photo was taken on a mobile phone and uploaded directly into Splunk.

AWS at Conf17

And of course, there is a whole load of other great AWS content at Splunk Conf 17 that you can check out. If I didn’t have a young baby nipping at my ankles, I’d be at many of these great sessions.

Do you need a custom integration performed with Splunk?

Whether it is with AWS or some other platform that you want to integrate with Splunk, please feel free to contact us so we can turn your ideas into working software!


Splunking Soldiers and First Responders

When I was an employee at Splunk, I was fortunate enough to be given the opportunity to work on some innovative military projects.

One such project  was the Smart Soldier App , with contributions from Justin Boucher (a US Army Veteran) and Ramik Chopra.

This App is essentially a data correlation narrative that aggregates data from multiple sources to provide more effective, targetted and expedited triage for soldiers in the field and provide commanders with real time insights into the condition and locations of soldiers and resources in the battle theatre :

  1. Soldier and Medic geolocation data and details
  2. Soldier real time health and condition metrics
  3. Soldier historical medical/patient information
  4. Soldier “smart suit” condition
  5. Geomapping based visualizations to determine the nearest medic and best treatment facility for an injured soldier

But why restrict this to just soldiers in the battlefield?  You could take this use case as inspiration to apply to any type of first responder scenario. The possibilities in your data are limitless, so do reach out to us if you have any ideas that you want to collaborate on.

You can find out more about the Smart Soldier App in the links below :

Smart Soldier Video

Smart Soldier PDF

Art of the Possible