Our Elasticsearch queries were slow. 10 seconds for simple searches. Users were frustrated.

Optimized indexing, shards, and queries. 10s → 100ms. Here’s what worked.

Table of Contents

The Problem

Before Optimization:

  • Query time: 10s average
  • Index size: 500GB
  • Shards: 1000+
  • Memory: Constant OOM
  • CPU: 90% usage

Shard Optimization

# Check shard distribution
GET _cat/shards?v&s=index

# Problem: Too many small shards
# Index: logs-2020-01 - 100 shards, 500MB each
# Solution: Fewer, larger shards

# Reindex with proper shard count
PUT logs-2020-01-optimized
{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 1
  }
}

POST _reindex
{
  "source": {
    "index": "logs-2020-01"
  },
  "dest": {
    "index": "logs-2020-01-optimized"
  }
}

Shard Sizing Rule:

  • Target: 20-50GB per shard
  • Max: 50GB per shard
  • Formula: shards = index_size_gb / 30

Index Mapping Optimization

PUT /products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "description": {
        "type": "text",
        "index_options": "offsets"
      },
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100
      },
      "created_at": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      },
      "tags": {
        "type": "keyword"
      },
      "metadata": {
        "type": "object",
        "enabled": false
      }
    }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "30s",
    "index.codec": "best_compression"
  }
}

Query Optimization

Before (Slow):

GET /products/_search
{
  "query": {
    "query_string": {
      "query": "laptop*",
      "fields": ["*"]
    }
  }
}

After (Fast):

GET /products/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": {
              "query": "laptop",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "description": "laptop"
          }
        }
      ]
    }
  },
  "_source": ["name", "price", "created_at"],
  "size": 20
}

Results:

  • Query time: 10s → 100ms (-99%)

Aggregation Optimization

Before (Slow):

GET /logs/_search
{
  "size": 0,
  "aggs": {
    "by_user": {
      "terms": {
        "field": "user_id",
        "size": 10000
      }
    }
  }
}

After (Fast):

GET /logs/_search
{
  "size": 0,
  "aggs": {
    "by_user": {
      "terms": {
        "field": "user_id",
        "size": 100,
        "shard_size": 500
      },
      "aggs": {
        "top_hits": {
          "top_hits": {
            "size": 1,
            "_source": ["timestamp", "action"]
          }
        }
      }
    }
  }
}

Bulk Indexing

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch(['localhost:9200'])

def bulk_index(documents, index_name):
    """Bulk index documents efficiently."""
    actions = [
        {
            "_index": index_name,
            "_id": doc["id"],
            "_source": doc
        }
        for doc in documents
    ]
    
    # Bulk index with helpers
    success, failed = helpers.bulk(
        es,
        actions,
        chunk_size=1000,
        request_timeout=60
    )
    
    return success, failed

# Usage
documents = [
    {"id": 1, "name": "Product 1", "price": 99.99},
    {"id": 2, "name": "Product 2", "price": 149.99},
    # ... 10000 more
]

success, failed = bulk_index(documents, "products")
print(f"Indexed: {success}, Failed: {failed}")

Performance:

  • Single index: 100 docs/s
  • Bulk index: 10,000 docs/s (100x faster)

Index Lifecycle Management

PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Monitoring

def check_cluster_health():
    """Monitor cluster health."""
    health = es.cluster.health()
    
    print(f"Status: {health['status']}")
    print(f"Nodes: {health['number_of_nodes']}")
    print(f"Shards: {health['active_shards']}")
    print(f"Relocating: {health['relocating_shards']}")
    print(f"Initializing: {health['initializing_shards']}")
    print(f"Unassigned: {health['unassigned_shards']}")

def check_slow_queries():
    """Find slow queries."""
    stats = es.indices.stats(metric='search')
    
    for index, data in stats['indices'].items():
        query_time = data['total']['search']['query_time_in_millis']
        query_count = data['total']['search']['query_total']
        
        if query_count > 0:
            avg_time = query_time / query_count
            if avg_time > 1000:  # > 1s
                print(f"Slow index: {index}, avg: {avg_time}ms")

Results

Query Performance:

Query TypeBeforeAfterImprovement
Simple search10s100ms99%
Aggregation30s500ms98%
Bulk index100/s10K/s100x

Resource Usage:

  • Memory: 32GB → 16GB (-50%)
  • CPU: 90% → 30% (-67%)
  • Disk I/O: -60%

Cost Savings:

  • Nodes: 10 → 5 (-50%)
  • Monthly cost: $5K → $2.5K (-50%)

Lessons Learned

  1. Shard size matters: 20-50GB optimal
  2. Mapping optimization critical: Right field types
  3. Query structure important: Use bool queries
  4. Bulk indexing essential: 100x faster
  5. ILM saves money: Auto-cleanup old data

Conclusion

Elasticsearch optimization delivered massive gains. Query time 10s → 100ms, 99% improvement, 50% cost reduction.

Key takeaways:

  1. Query time: 10s → 100ms (-99%)
  2. Bulk indexing: 100x faster
  3. Memory: 32GB → 16GB (-50%)
  4. Cost: $5K → $2.5K/month (-50%)
  5. Shard optimization critical

Optimize your Elasticsearch. Performance matters.