Containerization and Splunk: How Docker and Splunk Work Together
Containerization and Splunk: How Docker and Splunk Work Together
By: Karl Cepull | Director, Operational Intelligence and Managed Services
Note: Much of the information in this blog post was also presented as a TekTalk, including a live demo of Splunk running in Docker, and how to use Splunk to ingest Docker logs. Please see the recording of the TekTalk at http://go.tekstream.com/l/54832/2017-03-30/bknxhd.
You’ve heard of Splunk, and maybe used it. You’ve heard of Docker, and maybe used it, too. But have you tried using them together? It can be a powerful combination when you do!
But first, let’s review what Splunk and Docker are, just to set a baseline. Also, learn more about TekStream’s Splunk Services.
What is Splunk?
Splunk is the industry-leading solution for turning “digital exhaust” into business value. “Digital exhaust” refers to the almost unlimited amount of data being output by just about every digital device in the world today, such as application and web servers, databases, security and access devices, networking equipment, and even your mobile devices.
Usually, this data is in the form of log files. And, due to the volume being produced, it usually just sits on a hard drive somewhere until it expires and is deleted. It is only looked at when something goes wrong, and requires a lot of digging and searching to find anything useful.
Splunk changes all of that. It ingests those log files in near-real-time, and provides a “Google-like” search interface, making it extremely easy to search large amounts of data quickly. It also can correlate the information in myriad log files, allowing for an easier analysis of the bigger picture. Finally, it has a plethora of visualization and alerting options, allowing you to create rich reports and dashboards to view the information, and generate various alerts when specific conditions are met.
What is Docker?
Docker is also an industry leader. It is a container manager, that allows you to run multiple applications (or containers) side by side in an isolated manner, but without the overhead of creating multiple virtual machines (VMs) to do so. Containers give you the ability to “build once, run anywhere,” as Docker containers are designed to run on any host that can run Docker. Docker containers can also be distributed as whole “images,” making it easy to deploy applications and microservices.
Why use Splunk and Docker Together?
While there are many ways that you can use Splunk and Docker, there are two main configurations that we will address.
Using Docker to run Splunk in a Container
Running Splunk as a container in Docker has a lot of advantages. You can create an image that has Splunk pre-configured, which makes it easy to fire up an instance for testing, a proof-of-concept, or other needs. In fact, Splunk even has pre-configured images of Splunk Enterprise and the Universal Forwarder available in the Docker Hub for you to download!
Using Splunk to Monitor a Docker Container
In this configuration, one or more Docker containers are configured to send their logs and other operational information to a Splunk instance (which can also be running in another container, if desired!). Splunk has a free app for Docker that provides out-of-the-box dashboards and reports that show a variety of useful information about the events and health of the Docker containers being monitored, which provides value without having to customize a thing. If you are also using Splunk to ingest log information from the applications and services running inside of the containers, you can then correlate that information with that from the container itself to provide even more visibility and value.
Our Demo Environment
To showcase both of the above use cases, Splunk has a repository in GitHub that was used at their .conf2016 event in September of 2016. You can download and use the instructions to create a set of Docker containers that demonstrate both running Splunk in a container, as well as using Splunk to monitor a Docker container.
If you download and follow their instructions, what you build and run ends up looking like the following:
There are 5 containers that are built as part of the demo. The ‘wordpress’ and ‘wordpress_db’ containers are sample applications that you might typically run in Docker, and are instances of publicly-available images from the Docker Hub. Splunk Enterprise is running in a container as well, as is an instance of the Splunk Universal Forwarder. Finally, the container named “my_app” is running a custom app that provides a simple web page, and also generates some fake log data so there is something in Splunk to search.
By using a shared Volume (think of it as a shared drive) that the WordPress database logs are stored on, the Splunk Universal Forwarder is used to ingest the logs on that volume using a normal “monitor” input. This shows one way to ingest logs without having to install the UF on the container with the app.
The HTTP Event Collector (HEC) is also running on the ‘splunk’ container, and is used to receive events generated by the ‘my_app’ application. This show another way to ingest logs without using a UF.
Finally, HEC is also used to ingest events about the ‘wordpress’ and ‘wordpress_db’ containers themselves.
If you would like to see a demo of the above in action, please take a look at the recording of our TekTalk, which is available at http://go.tekstream.com/l/54832/2017-03-30/bknxhd.
Here is a screenshot of one of the dashboards in the Docker app, showing statistics about the running containers, to whet your appetite.
How Does it Work?
Running Splunk in a Container
Running Splunk in a container is actually fairly easy! As mentioned above, Splunk has pre-configured images available for you to download from the Docker Hub (a public repository of Docker images).
There are 4 images of interest – two for Splunk Enterprise (a full installation of Splunk that can be used as an indexer, search head, etc.), and two for the Universal Forwarder. For each type of Splunk (Enterprise vs Universal Forwarder), there is an image that just has the base code, and an image that also contains the Docker app.
Here’s a table showing the image name and details about each one:
[av_table purpose=’tabular’ pricing_table_design=’avia_pricing_default’ pricing_hidden_cells=” caption=” responsive_styling=’avia_responsive_table’ av_uid=’av-45p3et4′] [av_row row_style=’avia-heading-row’ av_uid=’av-44ubijs’][av_cell col_style=” av_uid=’av-42379rc’]Image Name[/av_cell][av_cell col_style=” av_uid=’av-40tyji0′]Description[/av_cell][/av_row] [av_row row_style=” av_uid=’av-3ykv36w’][av_cell col_style=” av_uid=’av-3wqn9ag’]splunk/splunk:6.5.2[/av_cell][av_cell col_style=” av_uid=’av-dykufc’]The base installation of Splunk Enterprise v6.5.2 (the current version available as of this writing).[/av_cell][/av_row] [av_row row_style=” av_uid=’av-3ujljaw’][av_cell col_style=” av_uid=’av-3rxhjpk’]splunk/splunk:6.5.2-monitor splunk/splunk:latest[/av_cell][av_cell col_style=” av_uid=’av-3qmay0o’] The base installation of Splunk Enterprise v6.5.2, with the Docker app also installed.[/av_cell][/av_row] [av_row row_style=” av_uid=’av-3ouwpiw’][av_cell col_style=” av_uid=’av-3oas9a0′]splunk/universalforwarder:6.5.2 splunk/universalforwarder:latest[/av_cell][av_cell col_style=” av_uid=’av-3mi4kzc’]The base installation of the Splunk Universal Forwarder, v6.5.2.[/av_cell][/av_row] [av_row row_style=” av_uid=’av-3k2zd60′][av_cell col_style=” av_uid=’av-3hxaq5k’] splunk/universalforwarder:6.5.2-monitor[/av_cell][av_cell col_style=” av_uid=’av-3hjfr6w’]The base installation of the Splunk Universal Forwarder v6.5.2, with the Docker add-in also installed.[/av_cell][/av_row] [/av_table]
Get the image(s):
- If you haven’t already, download a copy of Docker and install it on your system, and make sure it is running.
- Next, create an account at the Docker Hub – you’ll need that in a bit.
- From a command shell, log in to the Docker Hub using the account you created in step 2, using the following command:
docker login
- Now, download the appropriate image (from the list above) using the following command:
docker pull <imagename>
Start the Container:
To run Splunk Enterprise in a Docker container, use the following command:
docker run –d \ --name splunk -e “SPLUNK_START_ARGS=--accept-license” \ -e “SPLUNK_USER=root” \ –p “8000:8000” \ splunk/splunk
To run the Universal Forwarder in a Docker container, use the following command:
docker run –d \ --name splunkuniversalforwarder \ --env SPLUNK_START_ARGS=--accept-license \ --env SPLUNK_FORWARD_SERVER=splunk_ip:9997 \ --env SPLUNK_USER=root \ splunk/universalforwarder
In both cases, the “docker run” command tells Docker to create and run an instance of a given image (the “splunk/splunk” image in this case). The “-d” parameter tells it to run it as a “daemon” (meaning in the background). The “-e” (or “–env”) parameters set various environment variables that are passed to the application in the container (more below), and the “-p” parameter tells Docker to map the host port 8000 to port 8000 in the container. (This is so we can go to http://localhost:8000 on the host machine to get to the Splunk web interface.)
So, what are those “-e” values? Below is a table showing the various environment variables that can be passed to the Splunk image, and what they do. If a variable only applies to Splunk Enterprise, it is noted.
[av_table purpose=’tabular’ pricing_table_design=’avia_pricing_default’ pricing_hidden_cells=” caption=” responsive_styling=’avia_responsive_table’ av_uid=’av-3etdg54′] [av_row row_style=’avia-heading-row’ av_uid=’av-3dvat6w’][av_cell col_style=” av_uid=’av-3baejo8′]Environment Variable[/av_cell][av_cell col_style=” av_uid=’av-3abv6vc’]Description[/av_cell][av_cell col_style=” av_uid=’av-37xszdk’]How Used[/av_cell][/av_row] [av_row row_style=” av_uid=’av-36dr8hk’][av_cell col_style=” av_uid=’av-bb77bc’]SPLUNK_USER[/av_cell][av_cell col_style=” av_uid=’av-33v59xk’]User to run Splunk as. Defaults to ‘root’.[/av_cell][av_cell col_style=” av_uid=’av-31pgf7c’][/av_cell][/av_row] [av_row row_style=” av_uid=’av-30crlco’][av_cell col_style=” av_uid=’av-2y5udeg’]SPLUNK_BEFORE_START_CMD
SPLUNK_BEFORE_START_CMD_n[/av_cell][av_cell col_style=” av_uid=’av-2wuwk2g’]Splunk command(s) to execute prior to starting Splunk. ‘n’ is 1 to 30. Non-suffixed command executed first, followed by suffixed commands in order (no breaks).[/av_cell][av_cell col_style=” av_uid=’av-2v1ry7c’]./bin/splunk <SPLUNK_BEFORE_START_CMD[_n]>[/av_cell][/av_row] [av_row row_style=” av_uid=’av-2ty06zs’][av_cell col_style=” av_uid=’av-a33etk’]SPLUNK_START_ARGS[/av_cell][av_cell col_style=” av_uid=’av-2q77nmw’]Arguments to the Splunk ‘start’ command[/av_cell][av_cell col_style=” av_uid=’av-2ouvoqw’]./bin/splunk start <SPLUNK_START_ARGS>[/av_cell][/av_row] [av_row row_style=” av_uid=’av-2ng4p54′][av_cell col_style=” av_uid=’av-2lgetm0′]SPLUNK_ENABLE_DEPLOY_SERVER[/av_cell][av_cell col_style=” av_uid=’av-2k3wuo8′]If ‘true’, will enable the deployment server function. (Splunk Enterprise only.)[/av_cell][av_cell col_style=” av_uid=’av-2hqufrc’][/av_cell][/av_row] [av_row row_style=” av_uid=’av-2haj5s8′][av_cell col_style=” av_uid=’av-2f8uze0′]SPLUNK_DEPLOYMENT_SERVER[/av_cell][av_cell col_style=” av_uid=’av-2czuyjc’]Deployment server to point this instance to[/av_cell][av_cell col_style=” av_uid=’av-2c8fazc’]./bin/splunk set deploy-poll <SPLUNK_DEPLOYMENT_SERVER>[/av_cell][/av_row] [av_row row_style=” av_uid=’av-29w4jfs’][av_cell col_style=” av_uid=’av-sv38o’]SPLUNK_ENABLE_LISTEN
SPLUNK_ENABLE_LISTEN_ARGS[/av_cell][av_cell col_style=” av_uid=’av-264rurs’]The port and optional arguments to enable Splunk to listen on. (Splunk Enterprise only)[/av_cell][av_cell col_style=” av_uid=’av-7n0z0o’]./bin/splunk enable listen <SPLUNK_ENABLE_LISTEN> <SPLUNK_ENABLE_LISTEN_ARGS>[/av_cell][/av_row] [av_row row_style=” av_uid=’av-23ke0u0′][av_cell col_style=” av_uid=’av-2282p3c’]SPLUNK_FORWARD_SERVER
SPLUNK_FORWARD_SERVER_n
SPLUNK_FORWARD_SERVER_ARGS
SPLUNK_FORWARD_SERVER_ARGS_n[/av_cell][av_cell col_style=” av_uid=’av-1zetxo8′]One or more Splunk servers to forward events to, with optional arguments. ‘n’ is 1 to 10.[/av_cell][av_cell col_style=” av_uid=’av-1ylthd4′]./bin/splunk add forward-server <SPLUNK_FORWARD_SERVER[_n]> <SPLUNK_FORWARD_SERVER_ARGS[_n]>[/av_cell][/av_row] [av_row row_style=” av_uid=’av-1wqapnc’][av_cell col_style=” av_uid=’av-1uuiifc’]SPLUNK_ADD
SPLUNK_ADD_n[/av_cell][av_cell col_style=” av_uid=’av-1tsluc8′]Any monitors to set up. ‘n’ is 1 to 30.[/av_cell][av_cell col_style=” av_uid=’av-1segprs’]./bin/splunk add <SPLUNK_ADD[_n]>[/av_cell][/av_row] [av_row row_style=” av_uid=’av-1q1zb14′][av_cell col_style=” av_uid=’av-1p3oyaw’]SPLUNK_CMD
SPLUNK_CMD_n[/av_cell][av_cell col_style=” av_uid=’av-1nb6umg’]Any additional Splunk commands to run after it is started. ‘n’ is 1 to 30.[/av_cell][av_cell col_style=” av_uid=’av-1kr63d4′]./bin/splunk <SPLUNK_CMD[_n]>[/av_cell][/av_row] [/av_table]
Splunking a Docker Container
There are 2 main parts to setting up your environment to Splunk a Docker container. First, we need to set up Splunk to listen for events using the HTTP Event Collector. Second, we need to tell Docker to send its container logs and events to Splunk.
Setting up the HTTP Event Collector
The HTTP Event Collector (HEC) is a listener in Splunk that provides for an HTTP(S)-based URL that any process or application can POST an event to. (For more information, see our upcoming TekTalk and blog post on the HTTP Event Collector coming in June 2017.) To enable and configure HEC, do the following:
- From the Splunk web UI on the Splunk instance you want HEC to listen on, go to Settings | Data inputs | HTTP Event Collector.
- In the top right corner, click the Global Settings button to display the Edit Global Settings dialog. Usually, these settings do not need to be changed. However, this is where you can set what the default sourcetype and index are for events, whether to forward events to another Splunk instance (e.g. if you were running HEC on a “heavy forwarder”), and the port to listen on (default of 8088 using SSL). Click Save when done.
- Next, we need to create a token. Any application connecting to HEC to deliver an event must pass a valid token to the HEC listener. This token not only authenticates the sender as valid, but also ties it to settings, such as the sourcetype and index to use for the event. Click the New Token to bring up the wizard.
- On the Select Source panel of the wizard, give the token a name and optional description. If desired, specify the default source name to use if not specified in an event. You can also set a specific output group to forward events to. Click Next when done.
- On the Input Settings panel, you can select (or create) a default sourcetype to use for events that don’t specify one. Perhaps one of the most important options is on this screen – selecting a list of allowed indexes. If specified, events using this token can only be written to one of the listed events. If an index is specified in an event that is not on this list, it is dropped. You can also set a default index to use if none are specified in an individual event.
- Click Review when done with the Input Settings panel. Review your choices, then click Submit when done to create the token.
- The generated token value will then be shown to you. You will use this later when configuring the output destination for the Docker containers. (You can find this value later in the list of HTTP Event Collector tokens.)
Configuring Docker to send to Splunk
Now that Splunk is set up to receive event information using HEC, let’s see how to tell Docker to send data to Splunk. You do this by telling Docker to use the ‘splunk’ logging driver, which is built-in to Docker starting with version 1.10. You pass required and optional “log-opt” name/value pairs to provide additional information to Docker to tell it how to connect to Splunk.
The various “log-opt” values for the Splunk logging driver are:
[av_table purpose=’tabular’ pricing_table_design=’avia_pricing_default’ pricing_hidden_cells=” caption=” responsive_styling=’avia_responsive_table’ av_uid=’av-1jjxlew’] [av_row row_style=’avia-heading-row’ av_uid=’av-1haqpnc’][av_cell col_style=” av_uid=’av-570188′]‘log-opt’ Argument[/av_cell][av_cell col_style=” av_uid=’av-1f2q1y0′]Required?[/av_cell][av_cell col_style=” av_uid=’av-1ckf8mg’]Description[/av_cell][/av_row] [av_row row_style=” av_uid=’av-h17g8′][av_cell col_style=” av_uid=’av-195z0mw’]splunk-token[/av_cell][av_cell col_style=” av_uid=’av-18lfk9k’]Yes[/av_cell][av_cell col_style=” av_uid=’av-163zm20′]Splunk HTTP Event Collector token[/av_cell][/av_row] [av_row row_style=” av_uid=’av-148f4m0′][av_cell col_style=” av_uid=’av-13g6bo8′]splunk-url[/av_cell][av_cell col_style=” av_uid=’av-11ix6l4′]Yes[/av_cell][av_cell col_style=” av_uid=’av-3iu6js’]URL and port to HTTP Event Collector, e.g.: https://your.splunkserver.com:8088[/av_cell][/av_row] [av_row row_style=” av_uid=’av-yaz0iw’][av_cell col_style=” av_uid=’av-wd6da0′]splunk-source[/av_cell][av_cell col_style=” av_uid=’av-u6ukxk’]No[/av_cell][av_cell col_style=” av_uid=’av-tclreg’]Source name to use for all events[/av_cell][/av_row] [av_row row_style=” av_uid=’av-r5zweg’][av_cell col_style=” av_uid=’av-p683fc’]splunk-sourcetype[/av_cell][av_cell col_style=” av_uid=’av-njmm2g’]No[/av_cell][av_cell col_style=” av_uid=’av-lpgxbs’]Sourcetype of events[/av_cell][/av_row] [av_row row_style=” av_uid=’av-kvllwo’][av_cell col_style=” av_uid=’av-j3fam0′]splunk-index[/av_cell][av_cell col_style=” av_uid=’av-hr81h4′]No[/av_cell][av_cell col_style=” av_uid=’av-fce6m0′]Index for events[/av_cell][/av_row] [av_row row_style=” av_uid=’av-elqtko’][av_cell col_style=” av_uid=’av-190slk’]splunk-format[/av_cell][av_cell col_style=” av_uid=’av-b7z8xk’]No[/av_cell][av_cell col_style=” av_uid=’av-9bq454′]Message format. One of “inline”, “json”, or “raw”. Defaults to “inline”.[/av_cell][/av_row] [av_row row_style=” av_uid=’av-sbdd4′][av_cell col_style=” av_uid=’av-63lqbc’]labels / env[/av_cell][av_cell col_style=” av_uid=’av-4m2nbc’]No[/av_cell][av_cell col_style=” av_uid=’av-22g914′]Docker container labels and/or environment variables to include with the event.[/av_cell][/av_row] [/av_table]
In addition to the above “log-opt” variables, there are environment variables you can set to control advanced settings of the Splunk logging driver. See the Splunk logging driver page on the Docker Docs site for more information.
Splunking Every Container
To tell Docker to send the container information for all containers, specify the Splunk logging driver and log-opts when you start up the Docker daemon. This can be done in a variety of ways, but below are two common ones.
- If you start Docker from the command-line using the ‘dockerd’ command, specify the “–log-driver=splunk” option, like this:
dockerd --log-driver=splunk \ --log-opt splunk-token=4222EA8B-D060-4FEE-8B00-40C545760B64 \ --log-opt splunk-url=https://localhost:8088 \ --log-opt splunk-format=json
- If you use a GUI to start Docker, or don’t want to have to remember to specify the log-driver and log-opt values, you can create (or edit) the daemon.json configuration file for Docker. (See the Docker docs for information on where this file is located for your environment.) A sample daemon.json looks like this:
{ “log-driver”:”splunk”, “log-opts”:{ “splunk-token”:”4222EA8B-D060-4FEE-8B00-40C545760B64” “splunk-url”:”https://localhost:8088”, “splunk-format”:”json” } }
Either of the above options will tell Docker to send the container information for ALL containers to the specified Splunk server on the localhost port 8088 over https, using the HEC token that you created above. In addition, we have also overridden the default event format of “inline”, telling Docker to instead send the events in JSON format, if possible.
Splunk a Specific Docker Container
Instead of sending the container events for ALL containers to Splunk, you can also tell Docker to just send the container events for the containers you want. This is done by specifying the log-driver and log-opt values as parameters to the “docker run” command. An example is below.
docker run --log-driver=splunk \ --log-opt splunk-token=176FCEBF-4CF5-4EDF-91BC-703796522D20 \ --log-opt splunk-url=https://splunkhost:8088 \ --log-opt splunk-capath=/path/to/cert/cacert.pem \ --log-opt splunk-caname=SplunkServerDefaultCert \ --log-opt tag="{{.Name}}/{{.FullID}}" \ --log-opt labels=location \ --log-opt env=TEST \ --env "TEST=false" \ --label location=west \ your/application
The above example shows how to set and pass environment variables (“TEST”) and/or container labels (“location”), on each event sent to Splunk. It also shows how you can use the Docker template markup language to set a tag on each event with the container name and the container ID.
Hints and Tips
Running Splunk in a Container
- As of this writing, running Splunk in a Docker container has not been certified, and is unsupported. That doesn’t mean you can’t get support, just that if the problem is found to be related to running Splunk in the container, you may be on your own. However, Splunk has plans to support running in a container in the near future, so stay tuned!
- One of the advantages to running things in containers is that the containers can be started and stopped quickly and easily, and this can be leveraged to provide scalability by starting up more instances of an image when needed, and shutting them down when load subsides.
A Splunk environment, however, is not really suited for this type of activity, at least not in a production setup. For examples, spinning up or shutting down an additional indexer due to load isn’t easy – it needs to be part of a cluster, and clusters don’t like their members to be going up and down. - Whether running natively, in a VM, or in a container, Splunk has certain minimum resource needs (e.g. CPU, memory, etc.). By default, when running in a container, these resources are shared by all containers. It is possible to specify the maximum amounts of CPU and memory a container can use, but not the minimum, so you could end up starving your Splunk containers.
Splunking a Docker Container
- Definitely use the Docker app from Splunk! This provides out-of-the-box dashboards and reports that you can take advantage of immediately. (Hint: use the “splunk/splunk:6.5.2-monitor” image.)
- Use labels, tags, and environment variables passed with events to enhance the event itself. This will allow you to perform searches that filter on these values.
- Note that some scheduling tools for containers don’t have the ability to specify a log-driver or log-opts. There are workarounds for this, however.
Additional Resources
Below is a list of some web pages that I’ve found valuable when using Splunk and Docker together.
- http://www.splunk.com/containers – Info on using Splunk with container environments
- https://hub.docker.com/r/splunk/splunk/ – Docker image for Splunk Enterprise
- https://conf.splunk.com/files/2016/slides/how-to-run-splunk-as-a-docker-image.pdf– Slides from .conf2016 presentation entitled “How to run Splunk as a Docker Image?”
- https://hub.docker.com/r/splunk/universalforwarder/ – Docker image for Splunk Universal Forwarder
- https://docs.docker.com/get-started/ – Docker tutorial
- https://github.com/splunk/docker-gettingstarted-conf2016 – Files to build the demo environment mentioned above
- https://docs.docker.com/engine/admin/logging/splunk/ – Splunk logging driver details
- http://docs.splunk.com/Documentation/Splunk/6.5.3/Data/UsetheHTTPEventCollector – Information on the HTTP Event Collector
- The companion TekTalk on Splunk and Docker at http://go.tekstream.com/l/54832/2017-03-30/bknxhd
Happy Splunking!
Have more questions? Contact us today!