Shrinking indices in Elasticsearch

Optimizing Elasticsearch: How Many Shards per Index? | Qbox HES

The Problem

Today, we started receiving the following error from our production Elasticsearch cluster when a new index was about to be created:

{
  "error": {
    "root_cause": [
      {
        "type": "validation_exception",
        "reason": "Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [991]/[1000] maximum shards open;"
      }
    ],
    "type": "validation_exception",
    "reason": "Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [991]/[1000] maximum shards open;"
  },
  "status": 400
}

The error description was obvious that we would breach the shard limit of 1,000 when creating a new index.

Confirming the number from the error message using _cat/shards endpoint, we see that we had 991 shards in our only data node.

$ curl -s https://<aws_es_url>.es.amazonaws.com:443/_cat/shards | wc -l
991

We had about 99 indices and each index had 5 shards with one replica which contributes to 5 shards as well. So a total of 10 shards per index. We can confirm that by checking the index endpoint:

$ curl -s https://<aws_es_url>.es.amazonaws.com:443/<index_name>?pretty"

which shows the following output (shortened for brevity):

{
  "settings": {
    "number_of_shards": "5",
    "number_of_replicas": "1"
  }
}

Looking around in AWS help docs, they have suggested three solutions:

Suggested fixes

The 7.x versions of Elasticsearch have a default setting of no more than 1,000 shards per node. Elasticsearch throws an error if a request, such as creating a new index, would cause you to exceed this limit. If you encounter this error, you have several options:

  • Add more data nodes to the cluster.
  • Increase the _cluster/settings/cluster.max_shards_per_node setting.
  • Use the _shrink API to reduce the number of shards on the node.

We chose the shrink option because all our indices are small enough that they do not need 5 shards.

How to Shrink?

It is a 3 step process:

Step 1: Block writes on the current index

$ curl -XPUT -H 'Content-Type: application/json' https://<aws_es_url>.es.amazonaws.com:443/<current_index_name>/_settings -d'{
  "settings": {
    "index.number_of_replicas": 0,                                
    "index.routing.allocation.require._name": "shrink_node_name", 
    "index.blocks.write": true                                    
  }
}'

Step 2: Start shrinking with the new shard count

$ curl -XPOST -H 'Content-Type: application/json' https://<aws_es_url>.es.amazonaws.com:443/<current_index_name>/_shrink/<new_index_name> -d'{
  "settings": {
    "index.number_of_replicas": 1,
    "index.number_of_shards": 1, 
    "index.routing.allocation.require._name": null,
    "index.blocks.write": null
  }
}'

You can track the progress of the shrinking via the /_cat/recovery endpoint. Once the shrinking is complete, you can verify the document count via the _cat/indices endpoint.

Once you are happy with the shrinking, go to the next step.

Step 3: Delete the old index

$ curl -XDELETE https://<aws_es_url>.es.amazonaws.com:443/<current_index_name>

You can run the above commands for multiple indices through a shell script like below (place the index names in /tmp/indices.txt as one index name per line):

while read source; do
   <curl command>
done </tmp/indices.txt

Permanent Fix

All the above 3 steps only fixes the existing indices. Weโ€™ll need to make some code changes to ensure new indices created from now on is also created with the new setting of one shard.

Include settings.number_of_shards and settings.number_of_replicas in the request payload along with mappings when creating a new index. PHP code for reference:

[
    'mappings' => [
        'properties' => [
            .......
        ],
    ],
    'settings' => [
        'number_of_shards' => 1,
        'number_of_replicas' => 1,
    ]
];

You are now done! ๐Ÿ‘

You have successfully fixed both existing indices and new indices.

Further Reading


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *