Running Searches Using Splunk’s REST API: Part Two

By Jon Walthour, Senior Technical Architect

In Part One of this blog, I discussed Splunk’s REST API interface and how it can be used to run searches against Splunk event data. In Part One, I demonstrated how to interact with the Splunk REST API to run searches using the Linux cURL app. In Part Two, I’ll take a dive into how to run searches via the REST API utilizing Python.

For interacting with Splunk through Python, Splunk provides a SDK to abstract some of the more basic methods for Splunk interaction. Mind you, the SDK is not necessary to successfully interact with Splunk via REST and Python, but it does make that interaction much easier.

Setting Up Your Development Environment

So, let’s first set up our development environment. Using your git tool of choice, clone the Splunk Python SDK from here: https://github.com/splunk/splunk-sdk-python.git. As of the current 2.0.1 release, you’ll also need to install into your Python libraries if you don’t already have it the Python deprecation modules. Once you’ve done those two things, you can cp the “splunklib” directory and contents out of the “splunk-sdk-python” folder as that is all you need from it. What I do on a Linux host is to run the following in an empty directory:

git clone https://github.com/splunk/splunk-sdk-python.git 
cp -R splunk-sdk-python/splunklib . 
pip install deprecation

and I’m ready to go.

Now, in our new development context, open new Python (.py) file. We’ll need to start by importing two modules in the Splunk Python SDK “splunklib”, client and results.

import sys 
import splunklib.client as client 
import splunklib.results as results 
from time import sleep

The “client” module provides objects and collections for authenticating to and interacting with a Splunk instance over its management port (default port 8089). The “results” module provides a reader for streaming results in CSV, JSON or XML to a client’s environment from the Splunk search head. For running adhoc searches, these are the only two modules we’ll need.

Once we’ve got all the Python, we’ll need imported, we start by authenticating to our Splunk search head by creating a “service” object using the “client” module. As we did when using curl in Part 1 of this blog, we can authenticate with either a username and password or with an Oauth token. The username/password formulation of the “service” object creation looks like this:

service = client.connect(host='{Splunk search head}', port={management port}, username='{username}', password='{password}')

For OAuth token authenticataion, we utilize the parameter “splunkToken” like this:

service = client.connect(host='{Splunk search head}', port={management port},  splunkToken='{OAuth token}')

Going forward, this service object will be our portal into the Splunk search head. Next, we want to instantiate an instance of the “job” class in the “client” module, create A variable to hold the contents of our search string and a couple of dictionaries to pass in parameters for our search and for retrieving the output of the search like so:

jobs = service.jobs 
search = 'search index=_internal sourcetype=splunkd | head 5' 
kwargs_search = {'exec_mode': 'normal'} 
kwargs_results = {'output_mode': 'json', 
  'count': 0 
  }

Executing A Search In Splunk

As we discussed in Part 1, there 4 modes for executing a search in Splunk–normal, blocking, oneshot and export. A “normal” search is one that gets submitted to Splunk for execution by the Scheduler, the search is run asynchronously by Splunk and the results are stored on the search head for our later retrieval labelled by a “search ID” or “SID”. So, when coding for a “normal” search, we submit the job and are returned a SID by Splunk. This SID is stored in our instance of the job class. As with our curl example in Part 1, our code polls Splunk regularly waiting for the job to be in a “DONE” state so we can retrieve the results. While the search job is being completed, our code sits in a “while” loop. When the job is done, our code continues to retrieve the results. That would all look like this:

job = jobs.create(search, **kwargs_search) 

print("\nWaiting for the search to finish...") 

while True: 
  while not job.is_ready(): 
    pass 
  if job["isDone"] == "1": 
    print("\nDone!\n") 
    break 
  sleep(2)

Once the search has completed, we can retrieve the results, in either XML, JSON or CSV formats. For JSON output, which is my personal preference for its ease of use, we instantiate a special class in the results module called “JSONNResultsReader” like this:

result_stream = job.results(**kwargs_results) 
for result in results.JSONResultsReader(result_stream): 
    print(result)

To output results in XML or CSV, we utilize the read function in the results module:

kwargs_results = {'output_mode': 'xml', 
  'count': 0 
  } 

result_stream = job.results(**kwargs_results) 
print(result_stream.read())

kwargs_results = {'output_mode': 'csv', 
  'count': 0 
  } 

result_stream = job.results(**kwargs_results) 
print(result_stream.read())

That’s the basic flow of running a Splunk search in Python and returning results. There are a couple of helpful caveats to consider, though. We’ve only talked about the “normal” execution mode so far, but for most circumstances, the “blocking” mode or “oneshot” mode can prove a better choice as the “job.create” call won’t return until the job has completed. The “oneshot” approach can be more efficient here, since it only makes one roundtrip to the server to run the search and return the results. By contrast, once a “blocking” search you still have to retrieve the results. However, with a “blocking” search, the results are stored on the search head and can be retrieved again and again for the life of the search. If a oneshot search fails, you have to resubmit it. It also doesn’t handle large resultsets well due to its only making one trip. Therefore, use a “blocking” search for searches that could return a larger resultset and use oneshot only when you’re going to be returning a few results. A oneshot search would look like this:

search = 'search index=_internal sourcetype=splunkd | head 5' 
kwargs_results = {'output_mode': 'json', 
  'count': 0 
  } 

results = results.JSONResultsReader(service.jobs.oneshot(search, **kwargs_results)) 
for result in results: 
    print(result)

For searches you expect to return more than a few results, I’d recommend using a blocking search, where the job created doesn’t return until the job has been scheduled and run. That would look like this:

import sys 
import splunklib.client as client 
import splunklib.results as results 
from time import sleep

service = client.connect(host='{Splunk search head}', port={management port}, username='{username}', 
password='{password}') 
# or use an Oauth token like this: 
# service = client.connect(host='{Splunk search head}', port={management port},  splunkToken='{OAuth 
token}')

jobs = service.jobs 
search = 'search index=_internal sourcetype=splunkd | head 5' 
kwargs_search = {'exec_mode': 'blocking'} 
kwargs_results = {'output_mode': 'json', 
  'count': 0 
  }

job = jobs.create(search, **kwargs_search)

result_stream = job.results(**kwargs_results) 
for result in results.JSONResultsReader(result_stream): 
    print(result)

Using Pagination for Large Result Sets

One final technique to review: pagination. Sometimes you want to run a search and retrieve the results in batches. This is especially useful for very large result sets where you want to avoid filling up buffers. To do this in the results retrieval portion or a normal or blocking search, you would add an offset to the results parameters dictionary like this:

numResults = int(job["resultCount"]) 
offset = 0 
count = 10

while offset < numResults: 
  kwargs_results = {'output_mode': 'json', 
   'count': count, 
   'offset': offset 
   } 

  result_stream = job.results(**kwargs_results) 
   for result in results.JSONResultsReader(result_stream): 
   print(result) 

# Increase offset by count 
offset += count

Here you have it: How to interact with Splunk’s REST API to run searches through the Linux command line and cURL and via Python and the Splunk SDK for Python. These are by no means the only ways to intersect with Splunk in code. There are also the Splunk SDKs for Java, JavaScript, Go, Node JS, C#, php, Ruby and iOS and Android Xamarin for Splunk Mint (mobile intelligence). If you have any questions, comments or suggestions, I’d love to hear from you at jwalthour@tekstream. If my words here have helped you to see the expertise the whole TekStream team brings to Splunk, reach out to us at the form provided below.

Read more about TekStream Splunk services.

Missed Part 1? Read it here.

About the Author

Jon Walthour is an experienced IT professional with over 25 years of experience in administering servers, databases and applications, including over five years of experience helping customers discover success with Splunk as a Splunk Certified Consultant and Enterprise Security Administrator.

Jon holds a Bachelors in Psychology from Wright State University in Dayton, Ohio, and a Masters Divinity from Louisville Presbyterian Theological Seminary. Jon currently resides in Lafayette, Indiana.