Elasticsearch is a document database - it uses JSON as the data format. Documents are stored in collections called indices, and the data schema is flexible so not all documents in an index need to have the same fields.
You can use Elasticsearch as a generic data store, but it's particularly well suited to storing logs because it gives you lots of advanced querying features.
Elasticsearch is a Java application. The licensing model is a bit involved, but up to version 7.10 it's published under an open-source licence.
compose.yml sets up Elasticsearch to run in a container, publishing port 9200 which is the default port for the HTTP API.
Start the container:
docker-compose -f labs/elasticsearch/compose.yml up -d
Check the logs and you'll see Elasticsearch starting up:
docker logs obsfun_elasticsearch_1
These are semi-structured logs
We'll use curl to make HTTP requests - if you're using Windows, run this script to use the correct curl version:
# only for Windows - enable scripts:
Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope Process
# then run:
. ./scripts/windows-tools.ps1
Now make a simple call to the Elasticsearch API:
curl localhost:9200
📋 There's some basic info in the API response. What version are we running, and what is the cluster name?
The API response looks like this:
{
"name" : "68d8e3d046c4",
"cluster_name" : "elkstack",
"cluster_uuid" : "9yypBMAjRNC0hjMkr-FrEw",
"version" : {
"number" : "7.10.2",
"build_flavor" : "oss",
"build_type" : "tar",
"build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
"build_date" : "2021-01-13T00:42:12.435326Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
The version number and cluster name are set in the Docker image - 7.10.2
and elkstack
.
The Elasticsearch API has a full feature set for administering the cluster and for working with documents.
Indexing is how you store data in Elasticsearch. There are client libraries for all the major languages, so you can integrate Elasticsearch with your application.
We'll use the REST API in these exercises - start by inserting a document into a new index:
Index the document using an HTTP POST request:
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/logs/_doc' --data-binary "@labs/elasticsearch/data/fulfilment-requested.json"
The output includes an ID you can use to retrieve the document.
📋 What is the name of the index and the document ID?
The API response looks like this:
{
"_index":"logs",
"_type":"_doc",
"_id":"ZODwunoBUFcX3q_Yl3rW",
"_version":1,
"result":"created",
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"_seq_no":0,
"_primary_term":1
}
The index name is logs
- you don't need to create indices in advance, Elasticsearch will create them when you try to add documents.
The document ID is generated by Elasticsearch because we didn't specify an ID in the request. In this example it's ZODwunoBUFcX3q_Yl3rW
.
You can fetch the document back using an HTTP GET request - you'll need to set your own document ID in the URL:
curl localhost:9200/logs/_doc/<_id>?pretty
The
?pretty
flag formats the response to make it easier to read.
This is structured data - the log level, timestamp and message data is all stored in separate fields.
📋 Add more logs by indexing two more documents, from the files in labs/elasticsearch/data/fulfilment-completed.json
and labs/elasticsearch/data/fulfilment-errored.json
.
The POST requests are the same, only the path to the source file is different:
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/logs/_doc' --data-binary "@labs/elasticsearch/data/fulfilment-completed.json"
curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/logs/_doc' --data-binary "@labs/elasticsearch/data/fulfilment-errored.json"
You can use a CAT (compact and aligned text) API to check the index has all the documents:
curl localhost:9200/_cat/indices?v=true
It can take a few minutes for the status to update, but you should see
docs.count
column with the value3
for thelogs
index.
Now we have some data we can search.
The simplest search in Elasticsearch is to call the _search
endpoint on the index API, passing a search term in the querystring for the URL.
Search for the word debug
in any document in the logs
index:
curl 'localhost:9200/logs/_search?q=debug'
Finds a single document, with the log level of
DEBUG
- the search is not case-sensitive.
This basic search looks in all the fields in all the documents. The response includes a score
for each document, which is a calculation of how good a match it is for the search term.
One more simple query: adding a minus (-
) before the search term finds any documents which don't contain the term:
curl 'localhost:9200/logs/_search?q=-debug&pretty'
Finds the other two documents, as they don't have the word
debug
in any field.
You can write more complex query expressions in JSON, using the Elasticsearch Query DSL. Here are some examples using match queries:
queries/match-all.json matches all documents in the index
queries/match-id.json matches all documents containing the term 21304897
in the message
field
queries/match-id-level.json matches all documents containing the term 21304897
in the message
field AND the word INFO
in the level
field.
Using the Query DSL with structured data lets you be more precise. If you're looking for debug logs you can search using the log level field, so you won't accidentally include documents which have the word "debug" in another field.
You send JSON queries as GET requests to the search API, using this URL format:
localhost:9200/<index_name>/_search?pretty=true --data-binary "@<query_file_path>"
📋 Run some queries to find all logs about document request ID 21304897
, and then just the info logs for that request ID.
You can use the JSON files in the queries
folder.
To find all logs for that ID:
curl -H 'Content-Type: application/json' localhost:9200/logs/_search?pretty=true --data-binary '@labs/elasticsearch/queries/match-id.json'
Returns two matches, one info log and one debug.
To find just the info logs for that ID:
curl -H 'Content-Type: application/json' localhost:9200/logs/_search?pretty=true --data-binary '@labs/elasticsearch/queries/match-id-level.json'
Returns a single match.
There are lots of search features in Elasticsearch, so the Query DSL is quite complex. We've just had an introduction here, but we'll return to it in a later set of exercises.
Time for some practice of the index and search APIs. You've loaded individual documents into an index, but it's much quicker to bulk load them.
Start by bulk indexing all the documents in the file data/logs.json (note the data directory is in the root of the repo folder) - you'll need to use a different Document API for that.
Now write some match queries to find:
Cleanup by removing all containers:
docker rm -f $(docker ps -aq)