Running Searches Using Splunk’s REST API 

Part One

By Jon Walthour, Senior Technical Architect

One of the best things I like about Splunk is its flexibility. Splunk is like Legos; you can build just about anything you want with it. What comes with that flexibility is the ability to do just about anything in Splunk outside the web UI and command-line interfaces via Splunk’s REST API. Via the REST API, I can do anything I want in Splunk. If I cannot do it natively, I can always write a custom endpoint in Splunk to make it possible. Today, let me show you a common and straightforward way to use Splunk’s REST API to run a search and retrieve the results. I’ll show you two approaches to doing this–via cURL, Linux’s command-line tool for sending and receiving data with remote systems via URLs, and via Python, using Splunk Python SDK.

Generally, we will hit the Splunk REST endpoint for running search jobs, “/services/search/v2/jobs”. You could also use “/services/search/jobs,” but this endpoint is deprecated and will eventually no longer work. We’ll pass in some credentials, either a username and password or an OAuth token created in Splunk via “Settings” > “Tokens” (note, this is not a HEC token). Finally, we’ll pass in our search via a POST to the REST endpoint. 

Four Modes

There are four modes in which we can execute these searches. The first three are called an “execution mode,” and the fourth is an appendage to our endpoint. They are “normal” mode, “blocking” mode, “one-shot” mode, and “export” mode. A normal search runs asynchronously. It returns a search job immediately. Poll the job to determine its status. You can retrieve the results when the search has finished. You can also preview the results if “preview” is enabled. Normal mode works with real-time searches. A blocking search runs synchronously. It returns a search job once the search has finished, so there is no need to poll for status. Blocking mode doesn’t work with real-time searches. A one-shot search is a blocking search that is scheduled to run immediately. Instead of returning a search job, this mode returns the search results once completed. Because this is a blocking search, the results are available once the search has finished. Normal and blocking searches work with the search job scheduler, create search IDs (SIDs), and save results on the search head that can be retrieved later upon request. Oneshot searches run immediately without the job scheduler and stream the results back immediately, meaning they are not saved on the search head, and if the stream is interrupted for any reason, the results are not recoverable without rerunning the search. 

Export Mode

The fourth mode, “export” mode, is a mode of running a one-shot search so its output is formatted for easy use in other contexts. What I mean is that a normal, blocking, or one-shot search returns extra fields along with the results to detail such things as the indexer and index bucket the event came from, the time the event was received by that indexer, and the subseconds of the event’s timestamp. Export search results do not have these extra metadata fields, only the result’s serial number (order in the result set), the time of the event, the event’s host, source and sourcetype, the index, and indexer from which the event came along with the event itself. Not all these fields would be present in a transforming search, like one that uses the “stats” command. 

So, let’s start with a basic search run via curl–but first, a few notes on the structure of these REST calls. 

You must authenticate to a Splunk search head to use these REST endpoints. This will either be a username and password or a Splunk Oauth token. While you can technically run a search on an indexer, you will only get results from the index buckets stored on that indexer. So, you authenticate to a search head connected to several indexers to get complete results. Finally, you authenticate to Splunk’s management port, port 8089 by default. This is the port Splunk listens on for these sorts of communications. Don’t try to use the web UI port, which is usually port 8000; it won’t work. 

curl -s -k -u {username}:{password} https://{Splunk search head}:{management 
port}/services/search/v2/jobs \ 
    -d search="search index=_internal sourcetype=splunkd | head 5" \ 
    -d earliest_time="-7d" \ 
    -d earliest_time="-7d" \ 
    -d namespace="search" \ 
    -X POST 
    

Now, let’s deconstruct our curl command. All these elements will be present in any approach you use to run a search on Splunk. Note that all these elements are passed as data elements in the HTTP request. To learn more about any of them, consult Splunk’s online documentation. 

– “search” – the content of the search to be run by Splunk 

– “earliest_time” – the earliest time a search can look back to retrieve event data. If this is not included, Splunk defaults to the beginning of Epoch time, January 1, 1970, 12:00:00 AM UTC. 

– “latest_time” – the latest time a search can look to retrieve event data. If this parameter is not included, Splunk defaults to “now,” which is the time the search is run. 

– “namespace” – the app context in which the search is to be run 

This curl command would submit the search to Splunk and return a search ID (sid). To get the search results, we’d run a second curl command hitting the same REST endpoint with the SID appended to it. This will return details about the search along with its status, whether done or not, in a field called “dispatchState.” 

curl -s -k -u {username}:{password} https://{Splunk search head}:{management 
port}/services/search/v2/jobs/{search id} | grep dispatchState 

The value of “dispatchState” will be one of the following: QUEUED, PARSING, RUNNING, FINALIZING, DONE, PAUSE, INTERNAL_CANCEL, USER_CANCEL, BAD_INPUT_CANCEL, QUIT, FAILED. We’re looking for “DONE”. Once in that “DONE” state, we run a third curl command to retrieve the results stored on the search head. 

curl -s -k -u {username}:{password} https://{Splunk search head}:{management 
port}/services/search/v2/jobs/{search id}/results \ 
-d output_mode="json" 

The “output_mode” parameter tells Splunk how to display the output. It can be either “xml” (the default), “json,” or “csv.” 

When the results stream back, they are in one long stream of structured json. If the layout of these results matters to you, this is where “export” mode comes in. Again, “export” is not a data parameter. With the “export” endpoint, results are outputted in a layout of one event per line. Especially with “json” and “csv” output modes, utilizing this “export” mode layout can be handy. Additionally, I usually run these “export” searches with the “oneshot” exec mode. This runs the search immediately, outside the Splunk Scheduler. However, this also means no results are stored on the search head. So, if something happens to interrupt the search, it will need to be rerun. 

curl -s -k -u {username}:{password} https://{Splunk search head}:{management 
port}/services/search/v2/jobs/export \ 
    -d search="search index=_internal sourcetype=splunkd | head 5" \ 
    -d exec_mode="oneshot" \ 
    -d earliest_time="-7d" \  
    -d latest_time="now" \ 
    -d output_mode="json" \  
    -X POST 
     

Finally, let me cover Splunk’s OAuth tokens in this first part of this blog. There are circumstances where you don’t want to store a password in clear text, which could be used by a bad actor to log into Splunk UI. You could instead use a token. Creating and managing Splunk OAuth tokens are topics beyond the scope of this blog article, but you can use them in place of the username and password when constructing the curl command. The token also goes in the header of the HTTP request. Using a token, called a “Bearer” token, our example search would look like this: 

curl -s -k https://{Splunk search head}:{management port}/services/search/v2/jobs \ 
    -H "Authentication: Bearer {OAuth token}" \ 
    -d search="search index=_internal sourcetype=splunkd | head 5" \ 
    -d earliest_time="-7d" \ 
    -d namespace="search" \ 
    -X POST
    

Notice that we no longer pass in the username/password with the “-u” option. Rather, we have a new header parameter passing in the Bearer token, that long 442-character string, to Splunk for authentication. With Splunk’s REST API, you can always use OAuth Bearer tokens instead of username/password combinations. 

That’s it for this month. In the coming months, Part 2 of this blog article will explain how to use Splunk’s Python SDK to do all this and more with the Splunk REST API. 

Read more about TekStream Splunk services.