The Problem
Today, we started receiving the following error from our production Elasticsearch cluster when a new index was about to be created:
{ "error": { "root_cause": [ { "type": "validation_exception", "reason": "Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [991]/[1000] maximum shards open;" } ], "type": "validation_exception", "reason": "Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [991]/[1000] maximum shards open;" }, "status": 400 }
The error description was obvious that we would breach the shard limit of 1,000 when creating a new index.
Confirming the number from the error message using _cat/shards
endpoint, we see that we had 991 shards in our only data node.
$ curl -s https://<aws_es_url>.es.amazonaws.com:443/_cat/shards | wc -l 991
We had about 99 indices and each index had 5 shards with one replica which contributes to 5 shards as well. So a total of 10 shards per index. We can confirm that by checking the index endpoint:
$ curl -s https://<aws_es_url>.es.amazonaws.com:443/<index_name>?pretty"
which shows the following output (shortened for brevity):
{ "settings": { "number_of_shards": "5", "number_of_replicas": "1" } }
Looking around in AWS help docs, they have suggested three solutions:
Suggested fixes
The 7.x versions of Elasticsearch have a default setting of no more than 1,000 shards per node. Elasticsearch throws an error if a request, such as creating a new index, would cause you to exceed this limit. If you encounter this error, you have several options:
- Add more data nodes to the cluster.
- Increase the
_cluster/settings/cluster.max_shards_per_node
setting. - Use the _shrink API to reduce the number of shards on the node.
We chose the shrink option because all our indices are small enough that they do not need 5 shards.
How to Shrink?
It is a 3 step process:
Step 1: Block writes on the current index
$ curl -XPUT -H 'Content-Type: application/json' https://<aws_es_url>.es.amazonaws.com:443/<current_index_name>/_settings -d'{ "settings": { "index.number_of_replicas": 0, "index.routing.allocation.require._name": "shrink_node_name", "index.blocks.write": true } }'
Step 2: Start shrinking with the new shard count
$ curl -XPOST -H 'Content-Type: application/json' https://<aws_es_url>.es.amazonaws.com:443/<current_index_name>/_shrink/<new_index_name> -d'{ "settings": { "index.number_of_replicas": 1, "index.number_of_shards": 1, "index.routing.allocation.require._name": null, "index.blocks.write": null } }'
You can track the progress of the shrinking via the /_cat/recovery
endpoint. Once the shrinking is complete, you can verify the document count via the _cat/indices
endpoint.
Once you are happy with the shrinking, go to the next step.
Step 3: Delete the old index
$ curl -XDELETE https://<aws_es_url>.es.amazonaws.com:443/<current_index_name>
You can run the above commands for multiple indices through a shell script like below (place the index names in /tmp/indices.txt
as one index name per line):
while read source; do <curl command> done </tmp/indices.txt
Permanent Fix
All the above 3 steps only fixes the existing indices. We’ll need to make some code changes to ensure new indices created from now on is also created with the new setting of one shard.
Include settings.number_of_shards
and settings.number_of_replicas
in the request payload along with mappings when creating a new index. PHP code for reference:
[ 'mappings' => [ 'properties' => [ ....... ], ], 'settings' => [ 'number_of_shards' => 1, 'number_of_replicas' => 1, ] ];
You are now done! 👏
You have successfully fixed both existing indices and new indices.
Further Reading
- https://qbox.io/blog/optimizing-elasticsearch-how-many-shards-per-index
- https://discuss.elastic.co/t/how-to-fix-hitting-maximum-shards-open-error/200502
- https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
MOST COMMENTED
Flutter
Flutter Setup
React Native
Learn React Native with a Board Game (Part 1 of 4)
jQuery / Web Development
jQuery DataTable: Sorting dynamic data
Uncategorized
Hibernate – Associations are not loaded
Database / Java / MySQL / Spring Boot
Hibernate Error – Encountered problem trying to hydrate identifier for entity
Spring Boot / Uncategorized
Working with Hibernate in a multi-threaded application
Web Development
Designing REST APIs