Esctl: managing elasticsearch from the command line
During my month of unemployment, I spent time thinking on how I had tackled technical problems in my last year-and-half of work. The best part of my day-to-day mission was to ensure our Elasticsearch clusters were healthy, secured and efficient. To that end, I was using a combination of tools:
- raw
curl
commands, - bash scripts I wrote,
- graphical interfaces,
- Prometheus monitoring and alerting,
- Some automation,
- commands embedded in SaltStack
It's easy to see there is no common way to manage those clusters, and I was relying on a bunch of disparate stuff.
The problem
The issue does not come from Elasticsearch itself but is inherent to any software that exposes an HTTP API to manage itself: curl-of-the-death.
Here are some commands I used to run on a regular basis (examples come from Elasticsearch's documentation).
List indices
1 $ curl -X GET "localhost:9200/_cat/indices"
2 yellow open foo VrIiXmIRRA6BNP5JWaXKqA 1 1 0 0 283b 283b
Change the number of replicas of a given index
1 $ curl -X PUT "localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d'
2 {
3 "index" : {
4 "number_of_replicas" : 2
5 }
6 }
7 '
Reset a index's refresh interval to its default value
1 $ curl -X PUT "localhost:9200/twitter/_settings?pretty" -H 'Content-Type: application/json' -d'
2 {
3 "index" : {
4 "refresh_interval" : null
5 }
6 }
7 '
Pretty-print cluster stats
1 $ curl -X GET "localhost:9200/_cluster/stats"
2 {
3 "_nodes": {
4 "total": 1,
5 "successful": 1,
6 "failed": 0
7 },
8 ...
9 "nodes": {
10 "count": {
11 "total": 1,
12 "data": 1,
13 "coordinating_only": 0,
14 "master": 1,
15 "ingest": 1
16 },
17 "versions": [
18 "7.2.1"
19 ],
20 "jvm": {
21 "max_uptime_in_millis": 1565394,
22 "versions": [
23 {
24 "version": "12.0.1",
25 "vm_name": "OpenJDK 64-Bit Server VM",
26 "vm_version": "12.0.1+12",
27 "vm_vendor": "Oracle Corporation",
28 "bundled_jdk": true,
29 "using_bundled_jdk": true,
30 "count": 1
31 }
32 ],
33 "mem": {
34 "heap_used_in_bytes": 128029720,
35 "heap_max_in_bytes": 1056309248
36 },
37 "threads": 31
38 },
39 ...
40 }
41 }
Reroute shard 0 of index 'test' from node1 to node2
1 $ curl -X POST "localhost:9200/_cluster/reroute?pretty" -H 'Content-Type: application/json' -d'
2 {
3 "commands" : [
4 {
5 "move" : {
6 "index" : "test", "shard" : 0,
7 "from_node" : "node1", "to_node" : "node2"
8 }
9 }
10 ]
11 }
12 '
Change cluster's transient setting 'indices.recovery.max_bytes_per_sec' to 20mb
1 $ curl -X PUT "localhost:9200/_cluster/settings?flat_settings=true&pretty" -H 'Content-Type: application/json' -d'
2 {
3 "transient" : {
4 "indices.recovery.max_bytes_per_sec" : "20mb"
5 }
6 }
7 '
Commands are short, but if I need to pass URL parameters, content type, HTTP verb and a full JSON body, they are no longer short...
Of course, I could use some bash alias to hide the -H 'Content-Type: application/json'
, or maybe use the excellent HTTPie but the biggest pain comes from the JSON body!
Requirements
I needed a tool which could abstract all those long curl commands and could be flexible enough. After looking on the Internet, it appeared the few tools already written didn't support commands I needed. Thus, I decided to write my own, and I came with a list of requirements:
Requirement | Solution |
---|---|
Easy and fast to write | Python : high-level, easy to read and write, I know it well |
Fast to use | Command-line |
Abstract/hide params | Command-line params and integrated help ! I don't want to remember what verb to use, what URL params to give, what to put in the JSON body, etc... |
Don't spend time on building CLI | Openstack's Cliff |
Easy to switch cluster, without remembering long server name | Config file |
Each cluster may have different setup (ssl, auth, etc...) | Config file |
Nice and pretty output | Cliff comes with PrettyTable |
Inspiration | Openstack's CLI and kubectl |
The solution
I ended up writing esctl. It checks all my requirements and it's very easy to add new features by inheriting a class based on the output type.
It relies on a config file (inspired by kubectl
) to declare settings (global and cluster-wide), clusters, users and contexts (an association of a user and a cluster) :
1 settings:
2 no_check_certificate: true
3 max_retries: 0
4 timeout: 20
5
6 clusters:
7 localhost:
8 servers:
9 - http://localhost:9200
10
11 foo01-prd-sfo:
12 servers:
13 - https://master01-foo01-prd-sfo1.example.com
14 - https://master02-foo01-prd-sfo2.example.com
15 - https://master03-foo01-prd-sfo3.example.com
16 settings:
17 timeout: 60
18
19 users:
20 jerome:
21 username: jerome
22 password: P@ssw0rD
23
24 contexts:
25 localhost:
26 cluster: localhost
27
28 production:
29 user: jerome
30 cluster: foo01-prd-sfo
31
32 default-context: localhost
Esctl provides a lot of commands and subcommands to manage Elasticsearch:
usage: esctl [--version] [-v | -q] [--log-file LOG_FILE] [-h] [--debug] [--context CONTEXT]
esctl
optional arguments:
--version show program's version number and exit
-v, --verbose Increase verbosity of output. Can be repeated.
-q, --quiet Suppress output except warnings and errors.
--log-file LOG_FILE Specify a file to log output. Disabled by default.
-h, --help Show help message and exit.
--debug Show tracebacks on errors.
--context CONTEXT Context to use
Commands:
cat allocation Show shard allocation.
cluster allocation explain Provide explanations for shard allocations in the cluster.
cluster health Retrieve the cluster health.
cluster routing allocation enable Change the routing allocation status.
cluster stats Retrieve the cluster status.
complete print bash completion command (cliff)
config context list List all contexts.
help print detailed help for another command (cliff)
index close Close an index.
index create Create an index.
index delete Delete an index.
index list List all indices.
index open Open an index.
logging get Get a logger value.
logging reset Reset a logger value.
logging set Set a logger value.
node hot-threads Print hot threads on each nodes.
node list List nodes.
It allows to dramatically shorten previously shown commands :
List indices
1 $ esctl index list
2 +-------+--------+--------+------------------------+---------+---------+------------+--------------+------------+--------------------+
3 | Index | Health | Status | UUID | Primary | Replica | Docs Count | Docs Deleted | Store Size | Primary Store Size |
4 +-------+--------+--------+------------------------+---------+---------+------------+--------------+------------+--------------------+
5 | foo | yellow | open | VrIiXmIRRA6BNP5JWaXKqA | 1 | 1 | 0 | 0 | 283b | 283b |
6 +-------+--------+--------+------------------------+---------+---------+------------+--------------+------------+--------------------+
Change the number of replicas of a given index
Not implemented yet. Would look like :
1 $ esctl index settings set number_of_replicas 2 --index=twitter
Reset a index's refresh interval to its default value
Not implemented yet. Would look like :
1 $ esctl index settings reset refresh_interval --index=twitter
Pretty-print cluster stats
1 $ esctl cluster stats
2 +------------------------------------------------+------------------------------+
3 | Attribute | Value |
4 +------------------------------------------------+------------------------------+
5 | _nodes.failed | 0 |
6 | _nodes.successful | 1 |
7 | _nodes.total | 1 |
8 ...
9 | nodes.count.coordinating_only | 0 |
10 | nodes.count.data | 1 |
11 | nodes.count.ingest | 1 |
12 | nodes.count.master | 1 |
13 | nodes.count.total | 1 |
14 | nodes.discovery_types.single-node | 1 |
15 | nodes.fs.available_in_bytes | 48388599808 |
16 | nodes.fs.free_in_bytes | 51605315584 |
17 | nodes.fs.total_in_bytes | 62725623808 |
18 | nodes.jvm.max_uptime_in_millis | 1565394 |
19 | nodes.jvm.mem.heap_max_in_bytes | 1056309248 |
20 | nodes.jvm.mem.heap_used_in_bytes | 128029720 |
21 | nodes.jvm.threads | 31 |
22 | nodes.jvm.versions[0].bundled_jdk | True |
23 | nodes.jvm.versions[0].count | 1 |
24 | nodes.jvm.versions[0].using_bundled_jdk | True |
25 | nodes.jvm.versions[0].version | 12.0.1 |
26 | nodes.jvm.versions[0].vm_name | OpenJDK 64-Bit Server VM |
27 | nodes.jvm.versions[0].vm_vendor | Oracle Corporation |
28 | nodes.jvm.versions[0].vm_version | 12.0.1+12 |
29 ...
30 | status | yellow |
31 | timestamp | 1565893455640 |
32 +------------------------------------------------+------------------------------+
Reroute shard 0 of index 'test' from node1 to node2
Not implemented yet
Change cluster's transient setting 'indices.recovery.max_bytes_per_sec' to 20mb
Not implemented yet. Would look like :
1 $ esctl cluster settings set --transient indices.recovery.max_bytes_per_sec 20mb
What a subcommand look likes
I created 3 output class based on Cliff's ones:
EsctlCommand
: Doesn't expect any outputEsctlLister
: Expect a list of elements in order to create a multi-columns tableEsctlShowOne
: Expect a key-value list to create a two-columns table
To add a new subcommand, I only need to choose the output class (and inherit my class from it) and write the take_action
method:
1 def take_action(self, parsed_args):
2 """Generate or retrieve data to be displayed.
3
4 Arguments:
5 parsed_args {argparse.Namespace} -- Arguments from the command line.
6
7 Returns:
8 Any -- The data to be displayed, as specified by Cliff
9 """
10
11 return data
Here is, as a sample, the class associated to the esctl cluster health
command:
1 class ClusterHealth(EsctlShowOne):
2 """Retrieve the cluster health."""
3
4 def take_action(self, parsed_args):
5 # Retrieve the cluster health using the appropriate elasticsearch-py function. Then order the output and sort it
6 health = self._sort_and_order_dict(Esctl._es.cluster.health())
7
8 # Add coloration of the "status" (RED, YELLOW, GREEN) key based on it's value
9 health["status"] = Color.colorize(
10 health.get("status"), getattr(Color, health.get("status").upper())
11 )
12
13 # Return a tuple of tuple. It will lead to a two-column table : "Attribute" and "Value"
14 return (tuple(health.keys()), tuple(health.values()))
Which will display :
+----------------------------------+----------------+
| Field | Value |
+----------------------------------+----------------+
| active_primary_shards | 0 |
| active_shards | 0 |
| active_shards_percent_as_number | 100.0 |
| cluster_name | docker-cluster |
| delayed_unassigned_shards | 0 |
| initializing_shards | 0 |
| number_of_data_nodes | 1 |
| number_of_in_flight_fetch | 0 |
| number_of_nodes | 1 |
| number_of_pending_tasks | 0 |
| relocating_shards | 0 |
| status | green |
| task_max_waiting_in_queue_millis | 0 |
| timed_out | False |
| unassigned_shards | 0 |
+----------------------------------+----------------+
Instead of :
1 {
2 "cluster_name" : "docker-cluster",
3 "status" : "green",
4 "timed_out" : false,
5 "number_of_nodes" : 1,
6 "number_of_data_nodes" : 1,
7 "active_primary_shards" : 0,
8 "active_shards" : 0,
9 "relocating_shards" : 0,
10 "initializing_shards" : 0,
11 "unassigned_shards" : 0,
12 "delayed_unassigned_shards" : 0,
13 "number_of_pending_tasks" : 0,
14 "number_of_in_flight_fetch" : 0,
15 "task_max_waiting_in_queue_millis" : 0,
16 "active_shards_percent_as_number" : 100.0
17 }