Routing PII Data to Multiple Indexes

By Eric Levy, Splunk Consultant & Zubair Rauf, Team Lead, Cybersecurity Engineering

As Splunk Consultants we often come across customers who log personally identifiable information (PII) data on their systems that need to be indexed into Splunk for various use-cases. This data can be part of but not limited to sales logs, health records, and many other types of log files. This data can be used by various teams which may or may not require access to the PII to build analytics on this data for their daily business needs.

Oftentimes, multiple teams interact with the same data, but some teams would need to have sensitive portions masked/removed. In this blog post we will talk about a use-case which deals with logs from a sales system that records customer’s credit card information when a sale is processed and generates error logs if something doesn’t go as planned.

We used a custom-built script to generate some sample logs which are shown below. The script uses random words from the dictionary to record error messages:

2024-01-11 11:22:38: [TransactionID=1259] [CreditCardNumber=1516-3643-1843-5285] [TransactionAmount=265.81] [TransactionStatus=0]
2024-01-11 11:22:41: [TransactionID=1606] [CreditCardNumber=3301-5491-1216-2042] [TransactionAmount=49.29] [TransactionStatus=0]
2024-01-11 11:22:44: [TransactionID=1814] [CreditCardNumber=8894-10012-6514-9377] [TransactionAmount=878.50] [TransactionStatus=1]
2024-01-11 11:22:48: [ErrorMessage=piscian iodine tussive intenability alody trypanosomatous hypabyssal supervast splashing japishness ] [ErrorStackTrace=piscian iodine tussive intenability alody trypanosomatous ]
2024-01-11 11:22:51: [TransactionID=1735] [CreditCardNumber=2525-2750-4478-8785] [TransactionAmount=275.86] [TransactionStatus=1]
2024-01-11 11:22:55: [ErrorMessage=monadelphous triakistetrahedral slippery belage pungapung pustule leucophanite mantispid spiritualty misdetermine ] [ErrorStackTrace=monadelphous triakistetrahedral slippery belage pungapung pustule leucophanite mantispid spiritualty ]
2024-01-11 11:22:58: [TransactionID=1470] [CreditCardNumber=9384-8198-3585-2396] [TransactionAmount=370.37] [TransactionStatus=0]
2024-01-11 11:23:02: [ErrorMessage=Sunil Gaditan piaster pudicity tarkeean Juan ] [ErrorStackTrace=Sunil Gaditan piaster pudicity tarkeean Juan Sulafat trowelful ]

We will explore two methods to route and filter the same data and index it in two different indexes. We will use the Splunk props.conf and transforms.conf configuration files to transform our data and achieve the desired results.

Splunk‘s props.conf file allows Splunk admins to configure how the incoming data will be parsed — how to break events in a log stream, parse timestamps, call transforms to data, and create fields. The rules defined in transforms.conf are applied in the props configuration.

Splunk’s transforms.conf file allows Splunk admins to configure rules for transforming data. It allows users to mask data, create fields at index time, rewrite fields, and configure lookups.

The props.conf and transforms.conf configurations reside on search heads, indexers, and heavy forwarders, as well as universal forwarders in some cases. More details about Splunk’s props can be found in my colleague’s blog post – Data Onboarding in Splunk

Method 1 – Mask PII and send all data to a separate index

The first method demonstrates how to send all sales log data (transactions and error logs) to a separate index where teams who need access to masked data for their job can use the data.

The unmasked sales data is ingested as the sourcetype named private_sourcetype (for ease of understanding) into the private_index. The configs can be deployed to a new TA that can be used for parsing or $SPLUNK_HOME/etc/system/local/props.com|transforms.conf files.

With this method, we will first clone the private_sourcetype into a new sourcetype called public_sourcetype, where PII will be masked, and then route it to a public_index.

Parsing for both private_sourcetype and public_sourcetype should be done by using the props.conf settings shown further down this post.

To start, in our transforms.conf, we define the transform stanza that will clone our sourcetype. The cloned sourcetype will perform all remaining transformations.

###transforms.conf
[clone_sourcetype]
CLONE_SOURCETYPE = public_sourcetype
REGEX = .
DEST_KEY = _METADATA:Index
FORMAT = public_index
WRITE_META = true

This stanza calls the CLONE_SOURCETYPE command, which is the “secret sauce” of this trick. CLONE_SOURCETYPE allows you to duplicate data into another sourcetype. We then set the DEST_KEY to modify the index, indicating through REGEX value of .indicating that we want to replace the entire value and setting FORMAT to the name of the new index – in this case, “public_index” (since it will house our masked data).

Below props.conf shows how the transform is applied to the sourcetype. These props for the private_sourcetype is using the Super Six recommended settings to parse the logs while ingesting into Splunk.

###props.conf
[private_sourcetype]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
TRUNCATE = 200
TRANSFORMS-public_sourcetype = public_sourcetype

The second set of transforms that will be included in the same transforms.conf file will mask the data the before it is indexed into the new public_index so that it can be consumed by the right people without exposing PII data.

###transforms.conf
[mask_credit_card_info]
REGEX = ^(.*?)CreditCardNumber=\d{4,5}-\d{4,5}-\d{4,5}-\d{4,5}(.*)$
FORMAT = $1CreditCardNumber=XXXX-XXXX-XXXX-XXXX$2
DEST_KEY = _raw

The syntax is very similar to how we replaced the index name for the cloned data – we specify our REGEX for the events we want to target (which, in this case, is anything with a credit card number), then apply the FORMAT to the specific portion of the event we want to replace (hence the event being surrounded by $1 and $2), and finally set our DEST_KEY to mask our data within the _raw event.

To apply these transforms we use the props.conf settings shown below to the public_sourcetype stanza:

###props.conf
[public_sourcetype]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
TRUNCATE = 200
TRANSFORMS-mask_credit_card_info = mask_credit_card_info

When all these props and transforms are applied the private_index should show the unmasked data and the public_index should show the masked data:

Method 2 – Filter and route error logs to a separate index

With this method, we will explore how to filter and route error logs to the public_index. With this method we do not need to mask the PII as it is not being routed to the public_index.

To start, like we did in method 1, we will clone the sourcetype to sales_error_log (to make it easy to identify the data) using the transforms.conf config file.

###transforms.conf
[clone_sourcetype]
CLONE_SOURCETYPE = sales_error_log
REGEX = ^.*ErrorMessage=
DEST_KEY = _MetaData:Index
FORMAT = public_index
WRITE_META = true

In the above clone_sourcetype transforms stanza, we use the CLONE_SOURCETYPE parameter to specify a name for the new source. As we are only cloning the error message events, we specify a REGEX that matches the relevant event format. We also add DEST_KEY to identify that the destination for these new duplicated events is the public_index.

We apply these transforms in the props.conf to the private_sourcetype and also create a new stanza for the sales_error_log so that Splunk can easily parse the duplicated logs as well without issue.

[private_sourcetype]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
TRUNCATE = 200
TRANSFORMS-clone_private_sourcetype = clone_sourcetype
[sales_error_log]
LINE_BREAKER = ([\r\n]+)
SHOULD_LINEMERGE = false
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%d %H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
TRUNCATE = 200

With these props and transforms settings, we should be able to duplicate the error logs and send the duplicated copy to the public_index for consumption.

Here we’ve outlined two ways you can route sensitive PII data to multiple indexes using props and transforms without duplicating at the source. The first puts the same data in both indexes while redacting sensitive information (in this case, a credit card number) while the second separates regular events from error messages. Both approaches show Splunk’s flexibility in data management configurations and how it can provide tailored solutions for a wide array of use cases.

Find other Splunk blogs here, Splunk offerings here, and always feel free to reach out if you have any questions. Happy Splunking!