Logstash is a log processing system. You typically run it on all your servers, configured to process log files from all the apps running on the server.
Processing is done with pipelines - you specify the input file(s) to read, any parsing you want to do to the log entries, and the output destination to store logs.
Elasticsearch is one of the output destinations, so you can run Logstash to collect all your logs and store them centrally.
Pipeline input plugins - configuration for file input
Filter plugins - configuration for CSV and grok
Output plugins - configuration for console stdout and Elasticsearch
Logstash is a Java application - we'll run it alongside Elasticsearch, in containers:
Start Elasticsearch and Logstash, and check the Logstash logs:
docker-compose -f labs/logstash/compose.yml up -d
docker logs obsfun_logstash_1 -f
The
-f
flag means Docker will follow the logs, watching the container and printing any new entries.
When you see a log like this, you can exit the follow command with Ctrl-C
:
[2021-07-18T19:43:19,236][INFO ][logstash.agent ]Pipelines running {:count=>1, :running_pipelines=>[:"fulfilment-csv"], :non_running_pipelines=>[]}
This output says Logstash is configured with one processing pipeline, called fulfilment-csv
. The setup for that pipeline is here:
📋 What do you think that pipeline is doing?
There are three parts to the pipeline:
the input
block sets Logstash to look for files in the /data
folder, which start with the filename fulfilment-
and end with the extension .csv
the filter
block sets up parsing of the file, taking the input lines as comma-separated variables (CSV), and outputting named fields
the output
block will write all the processed log lines to the console (standard out), using a format based on the Ruby language
The Logstash container is sitting and watching the data folder for any incoming CSV files.
This data file has a version of the fulfilment processor logs written in the CSV format expected by the Logstash pipeline:
The Docker setup mounts a local folder into the /data
path in the Logstash container. When we copy files locally, they'll appear inside the container.
Now copy the CSV file into the mounted folder for the container:
cp data/fulfilment-20210707.csv labs/logstash/data/
Logstash will see the file created and process it
All the log entries will be written to the container console, which is useful for checking your pipeline formatting. Follow the logs to see it:
docker logs obsfun_logstash_1 -f
# Ctrl-C when the processing is finished
📋 What is the output of the processing pipeline?
Every log line has been parsed from CSV into JSON. The fields have been set using the names in the filter
block of the pipeline, and some additional fields have been added.
This input line:
2021-07-07T12:28:44Z,9A687539A1D7,ERROR,Fulfilment errored! Request ID: 37868708. Error message: document service unavailable.
Produces this output:
{
"@version" => "1",
"source" => "9A687539A1D7",
"timestamp" => "2021-07-07T12:28:44Z",
"host" => "d22c222ad09a",
"path" => "/data/fulfilment-20210707.csv",
"message" => "Fulfilment errored! Request ID: 37868708. Error message: document service unavailable.",
"@timestamp" => 2021-07-18T19:55:49.798Z,
"level" => "ERROR"
}
The output from Logstash produces semi-structured logs which are in a good format to be indexed in Elasticsearch.
Logstash has built-in support for Elasticsearch. This pipeline configuration is based on the previous one, but writes to Elasticsearch:
output
block to use Elasticsearch, with the target host name matching the Elasticsearch container name; it also extends the filter
block to load the timestamp from the log entryOverwrite the pipeline configuration the Logstash container is using, to load the Elasticsearch pipeline:
cp labs/logstash/pipeline-config/fulfilment-csv-to-es.yml labs/logstash/config/pipelines.yml
Check the logs and you'll see Logstash loading the new configuration:
docker logs obsfun_logstash_1 -n 2
The running pipelines log should now state
:running_pipelines=>[:"fulfilment-csv-to-es"]
📋 Copy the CSV file to the container's data directory again, and check the indices CAT API in Elasticsearch to see where the data has been loaded.
You can repeat the previous copy command to reload the same CSV file:
cp data/fulfilment-20210707.csv labs/logstash/data/
Elasticsearch is listening on the standard port:
curl localhost:9200/_cat/indices?v
You should see an index called
logstash-2021.07.07
with adocs.count
of 86.
Logstash loads the timestamp from the log entries in the CSV file, and stores them in an Elasticsearch index with the date in the index name.
You can query that index to find all the error logs:
curl 'localhost:9200/logstash-2021.07.07/_search?q=level:ERROR&pretty'
The Elasticsearch documents contain all the original data from the CSV logs, together with the extra fields set by Logstash.
So far we've just used a single pipeline with Logstash, but it can be configured to run multiple pipelines to process different logs.
Logstash can parse CSV files and other formats - and it has built-in processing for semi-structured data with known formats.
This pipeline parses Apache web server access logs:
📋 What do you think this pipeline is doing?
There are the same three parts to the pipeline:
the input
block looks for files in the /data
directory, starting with the name apache_logs
the filter
block sets up parsing of the file, using the grok
processor which applies Regular Expressions. This uses a named Regular Expression for Apache logs, which is built-in to Logstash; it also extracts the timestamp from the log entry
the output
block writes to Elasticsearch, using an index name which begins apache-
and ends with the date of the log entry
Overwrite the Logstash pipeline configuration to load both the CSV and Apache pipelines:
cp labs/logstash/pipeline-config/all-to-es.yml labs/logstash/config/pipelines.yml
Check the most recent container logs to confirm both pipelines are running:
docker logs obsfun_logstash_1 -n 2
You should see two pipelines in the logs
:running_pipelines=>[:"fulfilment-csv-to-es", :"apache-to-es"]
Now Logstash is watching for Apache log files as well as fulfilment processor CSV logs.
Copy in a small sample of logs to the Logstash data directory:
cp data/apache_logs-small labs/logstash/data/
📋 Check the name of the Apache index in Elasticsearch. Query that index to retrieve any document - how is the data formatted?
List all the indices:
curl localhost:9200/_cat/indices?v=true
You should see an index
apache-2021.06.17
with 20 documents
You can call the search API with no query and a size of 1 to return a single document:
curl 'localhost:9200/apache-2021.06.17/_search?size=1&pretty'
The semi-structured logs have been parsed into structured JSON.
This source:
83.149.9.216 - - [17/Jun/2021:10:05:43 +0000] "GET /presentations/logstash-monitorama-2013/images/kibana-dashboard3.png HTTP/1.1" 200 171717 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36"
Produces this output:
{
"@version":"1",
"auth":"-",
"host":"d22c222ad09a",
"ident":"-",
"verb":"GET",
"request":"/presentations/logstash-monitorama-2013/images/kibana-dashboard3.png",
"bytes":"171717",
"timestamp":"17/Jun/2021:10:05:43 +0000",
"path":"/data/apache_logs-small",
"clientip":"83.149.9.216",
"httpversion":"1.1",
"response":"200",
"agent":"/"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.77 Safari/537.36/"",
"referrer":"/"http://semicomplete.com/presentations/logstash-monitorama-2013//"",
"@timestamp":"2021-06-17T10:05:43.000Z"
}
Much more useful - we can write queries for specific client IP addresses or browsers, looking for certain paths and response status codes.
Logstash has lots of configuration options for parsing incoming data into structured fields. That gives you searchable logs, but the processing can be expensive - producing structured logs from your apps in the first place is a better option.
Regular Expressions can only get you so far. Sometimes the incoming data doesn't have the format you expect, and then Logstash can't process it.
There's a much bigger Apache log file in data/apache_logs-2021-05
. Copy that into the Logstash data directory and see what happens with the processing.
One entry in the 5000-line log file is malformed, can you find the problem?
Cleanup by removing all containers:
docker rm -f $(docker ps -aq)