PostgreSQL Index Optimization: From 12s to 80ms
Our user search was painfully slow. Searching for users by email took 12 seconds. With 500K users in the database, every query was a full table scan.
I spent a day learning PostgreSQL indexing. Now the same query takes 80ms. Here’s what I learned.
Table of Contents
The Problem
User table:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL,
username VARCHAR(100) NOT NULL,
first_name VARCHAR(100),
last_name VARCHAR(100),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
is_active BOOLEAN DEFAULT TRUE,
last_login TIMESTAMP
);
500K rows. Simple query:
SELECT * FROM users WHERE email = 'john@example.com';
Execution time: 12.3 seconds
Understanding EXPLAIN
First step: understand what’s happening.
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';
Output:
Seq Scan on users (cost=0.00..15234.00 rows=1 width=200) (actual time=8234.123..12345.678 rows=1 loops=1)
Filter: (email = 'john@example.com'::text)
Rows Removed by Filter: 499999
Planning time: 0.234 ms
Execution time: 12345.890 ms
Seq Scan = Sequential scan = Full table scan = BAD
PostgreSQL scanned all 500K rows to find one user.
Creating First Index
Simple B-tree index:
CREATE INDEX idx_users_email ON users(email);
Query again:
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'john@example.com';
Output:
Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=200) (actual time=0.045..0.047 rows=1 loops=1)
Index Cond: (email = 'john@example.com'::text)
Planning time: 0.123 ms
Execution time: 0.080 ms
80ms! That’s 150x faster!
Index Types
PostgreSQL supports several index types:
B-tree (default)
Best for equality and range queries:
CREATE INDEX idx_users_created_at ON users(created_at);
-- Works well for:
SELECT * FROM users WHERE created_at > '2016-01-01';
SELECT * FROM users WHERE created_at BETWEEN '2016-01-01' AND '2016-12-31';
Hash
Only for equality (rarely used):
CREATE INDEX idx_users_username_hash ON users USING HASH (username);
-- Only works for:
SELECT * FROM users WHERE username = 'john';
Not recommended in PostgreSQL 9.5 (not WAL-logged).
GIN (Generalized Inverted Index)
For full-text search and arrays:
CREATE INDEX idx_users_search ON users USING GIN (to_tsvector('english', first_name || ' ' || last_name));
-- Full-text search:
SELECT * FROM users WHERE to_tsvector('english', first_name || ' ' || last_name) @@ to_tsquery('john');
GiST (Generalized Search Tree)
For geometric data and full-text:
CREATE INDEX idx_locations_point ON locations USING GIST (coordinates);
Composite Indexes
Index on multiple columns:
CREATE INDEX idx_users_active_created ON users(is_active, created_at);
Works for:
-- Uses index
SELECT * FROM users WHERE is_active = TRUE AND created_at > '2016-01-01';
-- Uses index (leftmost column)
SELECT * FROM users WHERE is_active = TRUE;
-- DOESN'T use index (missing leftmost column)
SELECT * FROM users WHERE created_at > '2016-01-01';
Rule: Index columns in order of selectivity (most selective first).
Partial Indexes
Index only subset of rows:
CREATE INDEX idx_users_active_email ON users(email) WHERE is_active = TRUE;
Smaller index, faster queries:
-- Uses partial index
SELECT * FROM users WHERE email = 'john@example.com' AND is_active = TRUE;
-- Doesn't use partial index
SELECT * FROM users WHERE email = 'john@example.com' AND is_active = FALSE;
Great for:
- Active users (90% of queries)
- Recent records
- Non-deleted items
Covering Indexes
Include extra columns to avoid table lookup:
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (first_name, last_name);
Query:
SELECT first_name, last_name FROM users WHERE email = 'john@example.com';
PostgreSQL can answer from index alone (no table access).
Note: INCLUDE clause added in PostgreSQL 11. In 9.5, use composite index:
CREATE INDEX idx_users_email_names ON users(email, first_name, last_name);
Real-World Example
Our search feature:
-- Search active users by email or username
SELECT id, email, username, first_name, last_name
FROM users
WHERE is_active = TRUE
AND (email ILIKE '%john%' OR username ILIKE '%john%');
Problem: ILIKE with leading wildcard doesn’t use index.
Solution: Use trigram index (pg_trgm extension):
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_users_email_trgm ON users USING GIN (email gin_trgm_ops);
CREATE INDEX idx_users_username_trgm ON users USING GIN (username gin_trgm_ops);
Now ILIKE queries use index:
EXPLAIN ANALYZE SELECT * FROM users WHERE email ILIKE '%john%';
Output:
Bitmap Heap Scan on users (cost=12.34..234.56 rows=50 width=200) (actual time=2.345..5.678 rows=45 loops=1)
Recheck Cond: (email ~~* '%john%'::text)
Heap Blocks: exact=42
-> Bitmap Index Scan on idx_users_email_trgm (cost=0.00..12.33 rows=50 width=0) (actual time=2.123..2.123 rows=45 loops=1)
Index Cond: (email ~~* '%john%'::text)
Planning time: 0.456 ms
Execution time: 5.890 ms
From 12s to 5.8ms!
Index Maintenance
Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;
Indexes with idx_scan = 0 are unused.
Check index size
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
Rebuild bloated indexes
REINDEX INDEX idx_users_email;
Or rebuild all indexes on table:
REINDEX TABLE users;
When NOT to Index
- Small tables (< 1000 rows) - Sequential scan is faster
- High write, low read - Indexes slow down writes
- Low selectivity columns - e.g., boolean with 50/50 distribution
- Columns never in WHERE clause
Index Best Practices
- Index foreign keys:
CREATE INDEX idx_orders_user_id ON orders(user_id);
- Index columns in JOIN:
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
- Index columns in ORDER BY:
CREATE INDEX idx_users_created_at_desc ON users(created_at DESC);
- Don’t over-index:
- Each index slows down INSERT/UPDATE/DELETE
- Indexes take disk space
- More indexes = more maintenance
Monitoring Query Performance
Created view for slow queries:
CREATE VIEW slow_queries AS
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
WHERE mean_time > 100 -- queries slower than 100ms
ORDER BY mean_time DESC;
Requires pg_stat_statements extension:
CREATE EXTENSION pg_stat_statements;
Add to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
Our Final Index Strategy
-- Primary lookups
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_username ON users(username);
-- Active users (90% of queries)
CREATE INDEX idx_users_active_created ON users(created_at) WHERE is_active = TRUE;
-- Search functionality
CREATE INDEX idx_users_email_trgm ON users USING GIN (email gin_trgm_ops);
CREATE INDEX idx_users_username_trgm ON users USING GIN (username gin_trgm_ops);
-- Foreign keys
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_sessions_user_id ON sessions(user_id);
Results
| Query Type | Before | After | Improvement |
|---|---|---|---|
| Email lookup | 12.3s | 80ms | 99.3% |
| Username lookup | 11.8s | 75ms | 99.4% |
| Active users list | 8.5s | 120ms | 98.6% |
| Search by email | 15.2s | 5.8ms | 99.96% |
Lessons Learned
- Always use EXPLAIN ANALYZE - Understand before optimizing
- Index selectively - Not every column needs an index
- Monitor index usage - Remove unused indexes
- Consider partial indexes - For common query patterns
- Test with production data - Small datasets hide problems
Conclusion
Proper indexing transformed our database performance. Queries that took 10+ seconds now complete in milliseconds.
Key takeaways:
- Use EXPLAIN ANALYZE to find slow queries
- Create indexes on columns in WHERE, JOIN, ORDER BY
- Use partial indexes for common filters
- Monitor and remove unused indexes
- Indexes have a cost - don’t over-index
Database performance is often about indexing. Learn it well, and your applications will fly.