Search & Replace
Welcome to Part 1 of our examples series on Streaming Data Pre-Processing with Splunk.
In this example we are going to stream some text data events to Splunk , dynamically apply a regex filter to the raw incoming events and replace any text that matches the regex pattern with either a character sequence or a hash.
Setup
- Download and install the Protocol Data Inputs App.
- Restart Splunk.
- Log in to Splunk and browse to
Settings -> Data Inputs -> Protocol Data Inputs -> +Add New
- We are going to configure a new Protocol Input to listen for events via TCP on port 12358 (you can choose any port , I just like Fibonacci sequences)
- In order to perform the custom pre-processing , we need to declare a custom data handler to be applied to the incoming data. Conveniently I have bundled a Search/Replace data handler with the latest release of the Protocol Data Inputs App.Source Code is here.
- This is a generic Search/Replace handler that allows you to specify any regex to be matched , and then the matched groups will be replaced with either a character sequence or a hash. All of this is configurable via a JSON config string that can be declared in the config as shown above. A full config string example is as follows
Character replacement
{
"regex_pattern":"abc",
"replace_type":"char",
"replace_char":"#"
}
Hash replacement
{
"regex_pattern":"abc",
"replace_type":"hash",
"replace_hash_alg":"MD5"
} - Specify your sourcetype and index for the processed data
- There are several other performance tuning parameters available , but for this simple example we do not need to worry about them. Save your new config and do a quick check that your TCP port has been opened successfully
netstat -anlp | grep 12358
- Now we are ready to stream some events over TCP to Splunk. You can very simply test this by using the netcat(nc) program
echo -n "Hello abc def" | nc localhost 12358
- As you can see , the raw text event
"Hello abc def"
has been regex filtered and pre-processed into"Hello *** def"
- We could also change this to replace the matched pattern with a hash
Of course these are just trivial examples to make it more understandable and hopefully get you thinking about creating your own custom data handlers for any pre-processing use cases you have.In reality your Regex pattern would be more complex in order to search/replace fields such as
- Credit Card Numbers
- ID Numbers
- Authentication Credentials
- Any sensitive Personal Indentifiable Information (PII)
- etc..
Thanks for reading and stay tuned for Part 2 next week where we will demonstrate streaming binary data to Splunk in the form of Protobuf messages.