Splunk Users Beware: How Small Sourcetype Errors Cause Big Data Problems!
By David Allen, Senior Splunk Consultant
Splunk excels in processing provided information effectively, yet incorrect settings can lead to significant complications. Splunk users must proceed cautiously, considering long-term implications, particularly when configuring sourcetype settings.
In this blog I will show what happens when what appears to be “very” small REGEX issue can cause one quarter of the events to disappear and flagging many timestamp parsing issues.
Date Configuration Issues
This issue manifested itself as a Data Quality / Timestamp Parsing Issue. The Timestamp Parsing issue showed up at the beginning of every month and lasted until the 10th of the month and corresponded with a significant drop in the number of events at the beginning of each month.
Notice in the graph to follow that there are approximately 200,000 events per hour on Sunday, March 31. Then the count dropped to less than 5000 per hour at the beginning of April 1.

Now you might be thinking “What is the big deal – this could be real data” and that is true. But it is suspicious, and it did also flag many Timestamp Parsing issues. So first, let’s look at the LINE_BREAKER and see if it is setup correctly.
Here are the events on March 31 where the LINE_BREAKER is working correctly….

Here is 1 event on April 1 since the LINE_BREAKER does not work for these events.

Here is the original LINE_BREAKER regex. See how it does not work for events on April 4 but works just fine for March 31. Why is this? Look closely at the regex and see if you can figure it out.

Did you catch the error? The error is with the whitespace “\s” as shown below…

This whitespace regex only allows for ONE whitespace character in front of the date which is correct when there are two digits in the date of the month (10-31). However, when there is only 1 digit for the date of the month then this REGEX does not work for the LINE_BREAKER setting.
By allowing for more than 1 whitespace character in the REGEX below we can see the issue is fixed.

Now we have a valid LINE_BREAKER REGEX. Let’s look at why the Timestamp parsing issue showed up on the Data Quality screen.
The Splunk time format documentation shows that there are two settings for the date.

In our example, the setting was mistakenly set to %d instead of %e. One might think such an error would be obvious upon reviewing the data. However, it’s important to consider that the initial date setting might have been chosen during a mid-month period with a limited dataset where everything appeared to function correctly during ingestion. This assumption typically holds true in most cases, around 99% of the time. Waiting until the start of the month to observe date behavior might not always be practical, but analyzing events spanning an entire month should provide sufficient data to select the correct setting.
Putting it all together…. Here is the original sourcetype definition with the incorrect LINE_BREAKER and TIME_FORMAT….

Here is the corrected sourcetype definition. Notice the %d was changed to %e and the whitespace REGEX in front of the date has been changed to allow for more than 1 character.

The above changes were implemented on April 3rd, and you can see below the event count returning back to the 200,000 per day count.

Remember the April 1 event that many events all clumped together? Here is what the events look like on April 4th.

Hour Configuration
Similarly, incorrect configuration of the hour setting can impact the LINE_BREAKER if the hour is part of the LINE_BREAKER regex. Even if events are correctly segmented, an incorrect hour setting in the TIME_FORMAT can prevent Splunk from identifying the timestamp within the event, resulting in a timestamp parsing error. Splunk will then resort to alternative methods to calculate the timestamp.
To setup the hour correctly for the TIME_FORMAT setting, Splunk has the following setting options available…

You will notice that the %H and the %k for the hour are like the %d and %e for the date. So ingesting events in the afternoon (hours 12 – 23) and not checking what the AM hours are set at could be a problem.
The %I is always a 12-hour clock which can optionally take either a leading 0 but never a leading space. This setting always uses the %p to signify am or pm. Never use a %H or a %k with a %p.
Conclusion
This blog has illuminated the significant repercussions of minor configuration errors within Splunk, particularly in relation to sourcetype settings. The example provided vividly illustrates how overlooking nuances in the LINE_BREAKER regex and TIME_FORMAT settings can lead to substantial disruptions in data interpretation and analysis. This was evident from the drastic reduction in event counts at the beginning of each month, directly linked to erroneous configurations. By promptly addressing these issues, such as adjusting the LINE_BREAKER regex to accommodate varying date formats and correcting the TIME_FORMAT setting the integrity of data processing was swiftly restored. This scenario underscores the critical importance of meticulous configuration management in Splunk deployments, emphasizing the need for thorough testing and validation to ensure consistent and reliable data insights.
Learn about our managed services here!