Elasticsearch Performance Tuning: From 10s to 100ms Queries
Our Elasticsearch queries were slow. 10 seconds for simple searches. Users were frustrated.
Optimized indexing, shards, and queries. 10s → 100ms. Here’s what worked.
Table of Contents
The Problem
Before Optimization:
- Query time: 10s average
- Index size: 500GB
- Shards: 1000+
- Memory: Constant OOM
- CPU: 90% usage
Shard Optimization
# Check shard distribution
GET _cat/shards?v&s=index
# Problem: Too many small shards
# Index: logs-2020-01 - 100 shards, 500MB each
# Solution: Fewer, larger shards
# Reindex with proper shard count
PUT logs-2020-01-optimized
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
}
}
POST _reindex
{
"source": {
"index": "logs-2020-01"
},
"dest": {
"index": "logs-2020-01-optimized"
}
}
Shard Sizing Rule:
- Target: 20-50GB per shard
- Max: 50GB per shard
- Formula:
shards = index_size_gb / 30
Index Mapping Optimization
PUT /products
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"description": {
"type": "text",
"index_options": "offsets"
},
"price": {
"type": "scaled_float",
"scaling_factor": 100
},
"created_at": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"tags": {
"type": "keyword"
},
"metadata": {
"type": "object",
"enabled": false
}
}
},
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "30s",
"index.codec": "best_compression"
}
}
Query Optimization
Before (Slow):
GET /products/_search
{
"query": {
"query_string": {
"query": "laptop*",
"fields": ["*"]
}
}
}
After (Fast):
GET /products/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "laptop",
"boost": 2
}
}
},
{
"match": {
"description": "laptop"
}
}
]
}
},
"_source": ["name", "price", "created_at"],
"size": 20
}
Results:
- Query time: 10s → 100ms (-99%)
Aggregation Optimization
Before (Slow):
GET /logs/_search
{
"size": 0,
"aggs": {
"by_user": {
"terms": {
"field": "user_id",
"size": 10000
}
}
}
}
After (Fast):
GET /logs/_search
{
"size": 0,
"aggs": {
"by_user": {
"terms": {
"field": "user_id",
"size": 100,
"shard_size": 500
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 1,
"_source": ["timestamp", "action"]
}
}
}
}
}
}
Bulk Indexing
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch(['localhost:9200'])
def bulk_index(documents, index_name):
"""Bulk index documents efficiently."""
actions = [
{
"_index": index_name,
"_id": doc["id"],
"_source": doc
}
for doc in documents
]
# Bulk index with helpers
success, failed = helpers.bulk(
es,
actions,
chunk_size=1000,
request_timeout=60
)
return success, failed
# Usage
documents = [
{"id": 1, "name": "Product 1", "price": 99.99},
{"id": 2, "name": "Product 2", "price": 149.99},
# ... 10000 more
]
success, failed = bulk_index(documents, "products")
print(f"Indexed: {success}, Failed: {failed}")
Performance:
- Single index: 100 docs/s
- Bulk index: 10,000 docs/s (100x faster)
Index Lifecycle Management
PUT _ilm/policy/logs_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_size": "50GB",
"max_age": "7d"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": {
"number_of_shards": 1
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"freeze": {}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Monitoring
def check_cluster_health():
"""Monitor cluster health."""
health = es.cluster.health()
print(f"Status: {health['status']}")
print(f"Nodes: {health['number_of_nodes']}")
print(f"Shards: {health['active_shards']}")
print(f"Relocating: {health['relocating_shards']}")
print(f"Initializing: {health['initializing_shards']}")
print(f"Unassigned: {health['unassigned_shards']}")
def check_slow_queries():
"""Find slow queries."""
stats = es.indices.stats(metric='search')
for index, data in stats['indices'].items():
query_time = data['total']['search']['query_time_in_millis']
query_count = data['total']['search']['query_total']
if query_count > 0:
avg_time = query_time / query_count
if avg_time > 1000: # > 1s
print(f"Slow index: {index}, avg: {avg_time}ms")
Results
Query Performance:
| Query Type | Before | After | Improvement |
|---|---|---|---|
| Simple search | 10s | 100ms | 99% |
| Aggregation | 30s | 500ms | 98% |
| Bulk index | 100/s | 10K/s | 100x |
Resource Usage:
- Memory: 32GB → 16GB (-50%)
- CPU: 90% → 30% (-67%)
- Disk I/O: -60%
Cost Savings:
- Nodes: 10 → 5 (-50%)
- Monthly cost: $5K → $2.5K (-50%)
Lessons Learned
- Shard size matters: 20-50GB optimal
- Mapping optimization critical: Right field types
- Query structure important: Use bool queries
- Bulk indexing essential: 100x faster
- ILM saves money: Auto-cleanup old data
Conclusion
Elasticsearch optimization delivered massive gains. Query time 10s → 100ms, 99% improvement, 50% cost reduction.
Key takeaways:
- Query time: 10s → 100ms (-99%)
- Bulk indexing: 100x faster
- Memory: 32GB → 16GB (-50%)
- Cost: $5K → $2.5K/month (-50%)
- Shard optimization critical
Optimize your Elasticsearch. Performance matters.