AI-Driven Database Query Optimization: 10x Performance Improvement
Our database queries were slow. Some took 45 seconds. Users complained. I used GPT-4 to analyze and optimize queries.
Results: 10x faster queries, found missing indexes, improved schema. Here’s how.
Table of Contents
The Problem
Slow Queries:
- User search: 45 seconds
- Dashboard load: 30 seconds
- Report generation: 2 minutes
- Database: PostgreSQL, 50GB data
AI-Powered Query Analysis
from openai import OpenAI
import sqlparse
client = OpenAI()
def analyze_query(query, explain_output=None):
"""Analyze SQL query for optimization opportunities."""
prompt = f"""
Analyze this SQL query for performance issues:
```sql
{query}
{f”EXPLAIN output:\n{explain_output}” if explain_output else ""}
Identify:
- Missing indexes
- Inefficient JOINs
- N+1 query problems
- Full table scans
- Subquery optimization opportunities
- Query rewrite suggestions
For each issue:
- Severity (Critical/High/Medium/Low)
- Impact on performance
- Suggested fix with SQL
- Expected improvement
Format: JSON """
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return response.choices[0].message.content
## Example 1: Missing Index
**Original Query** (45 seconds):
```sql
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.email LIKE '%@gmail.com'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 100;
AI Analysis:
{
"issues": [
{
"severity": "Critical",
"type": "Full Table Scan",
"description": "LIKE with leading wildcard prevents index usage",
"impact": "Scanning all 1M users",
"fix": "Use full-text search or trigram index",
"sql": "CREATE INDEX idx_users_email_trgm ON users USING gin(email gin_trgm_ops);"
},
{
"severity": "High",
"type": "Missing Index",
"description": "No index on orders.user_id for JOIN",
"impact": "Slow JOIN operation",
"fix": "Add index on foreign key",
"sql": "CREATE INDEX idx_orders_user_id ON orders(user_id);"
}
],
"optimized_query": "-- Use trigram search\nSELECT u.*, COUNT(o.id) as order_count\nFROM users u\nLEFT JOIN orders o ON u.id = o.user_id\nWHERE u.email % '@gmail.com' -- Trigram similarity\nGROUP BY u.id\nORDER BY order_count DESC\nLIMIT 100;",
"expected_improvement": "90% faster (45s → 4s)"
}
After Optimization (4 seconds):
-- Add indexes
CREATE INDEX idx_users_email_trgm ON users USING gin(email gin_trgm_ops);
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Optimized query
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.email % '@gmail.com'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 100;
Result: 45s → 4s (91% faster)
Example 2: N+1 Query Problem
Original Code (30 seconds for 100 users):
# N+1 problem
users = User.query.all()
for user in users:
user.orders = Order.query.filter_by(user_id=user.id).all()
user.total_spent = sum(o.amount for o in user.orders)
AI Analysis:
{
"issues": [
{
"severity": "Critical",
"type": "N+1 Query",
"description": "Executing 1 + N queries (1 for users, N for orders)",
"impact": "101 queries for 100 users",
"fix": "Use JOIN or eager loading",
"optimized_code": "users = User.query.options(joinedload(User.orders)).all()\nfor user in users:\n user.total_spent = sum(o.amount for o in user.orders)"
}
],
"expected_improvement": "95% faster (30s → 1.5s)"
}
After Optimization (1.5 seconds):
# Single query with JOIN
users = User.query.options(joinedload(User.orders)).all()
for user in users:
user.total_spent = sum(o.amount for o in user.orders)
Result: 30s → 1.5s (95% faster)
Example 3: Inefficient Subquery
Original Query (2 minutes):
SELECT p.*,
(SELECT COUNT(*) FROM reviews WHERE product_id = p.id) as review_count,
(SELECT AVG(rating) FROM reviews WHERE product_id = p.id) as avg_rating,
(SELECT COUNT(*) FROM orders WHERE product_id = p.id) as order_count
FROM products p
WHERE p.category = 'Electronics'
ORDER BY order_count DESC;
AI-Optimized Query (8 seconds):
SELECT p.*,
COALESCE(r.review_count, 0) as review_count,
COALESCE(r.avg_rating, 0) as avg_rating,
COALESCE(o.order_count, 0) as order_count
FROM products p
LEFT JOIN (
SELECT product_id,
COUNT(*) as review_count,
AVG(rating) as avg_rating
FROM reviews
GROUP BY product_id
) r ON p.id = r.product_id
LEFT JOIN (
SELECT product_id,
COUNT(*) as order_count
FROM orders
GROUP BY product_id
) o ON p.id = o.product_id
WHERE p.category = 'Electronics'
ORDER BY order_count DESC;
Result: 120s → 8s (93% faster)
Automated Optimization Pipeline
import psycopg2
from typing import List, Dict
class QueryOptimizer:
def __init__(self, db_connection):
self.conn = db_connection
self.client = OpenAI()
def find_slow_queries(self, min_duration_ms=1000):
"""Find slow queries from pg_stat_statements."""
query = """
SELECT query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time
FROM pg_stat_statements
WHERE mean_exec_time > %s
ORDER BY mean_exec_time DESC
LIMIT 20;
"""
cursor = self.conn.cursor()
cursor.execute(query, (min_duration_ms,))
return cursor.fetchall()
def get_explain_plan(self, query):
"""Get EXPLAIN ANALYZE output."""
cursor = self.conn.cursor()
cursor.execute(f"EXPLAIN ANALYZE {query}")
return '\n'.join([row[0] for row in cursor.fetchall()])
def optimize_query(self, query):
"""Optimize single query."""
# Get execution plan
explain = self.get_explain_plan(query)
# Analyze with AI
analysis = analyze_query(query, explain)
return json.loads(analysis)
def generate_optimization_report(self):
"""Generate comprehensive optimization report."""
slow_queries = self.find_slow_queries()
report = "# Database Optimization Report\n\n"
for query, calls, total_time, mean_time, max_time in slow_queries:
report += f"## Query (Mean: {mean_time:.2f}ms)\n\n"
report += f"```sql\n{query}\n```\n\n"
# Optimize
optimization = self.optimize_query(query)
report += f"### Issues Found\n\n"
for issue in optimization.get('issues', []):
report += f"- **{issue['severity']}**: {issue['description']}\n"
report += f" - Fix: {issue['fix']}\n"
report += f" - Expected: {issue.get('expected_improvement', 'N/A')}\n\n"
if 'optimized_query' in optimization:
report += f"### Optimized Query\n\n"
report += f"```sql\n{optimization['optimized_query']}\n```\n\n"
return report
# Usage
conn = psycopg2.connect("dbname=mydb user=postgres")
optimizer = QueryOptimizer(conn)
report = optimizer.generate_optimization_report()
with open('optimization_report.md', 'w') as f:
f.write(report)
Schema Optimization
AI-Suggested Improvements:
-- Original schema
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER,
product_id INTEGER,
quantity INTEGER,
price DECIMAL,
created_at TIMESTAMP
);
-- AI-optimized schema
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL CHECK (quantity > 0),
price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
-- Add indexes
CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id),
CONSTRAINT fk_product FOREIGN KEY (product_id) REFERENCES products(id)
);
-- Indexes for common queries
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);
-- Composite index for common filter
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);
Real Results
Queries Optimized: 47
Performance Improvements:
- User search: 45s → 4s (91% faster)
- Dashboard: 30s → 1.5s (95% faster)
- Reports: 120s → 8s (93% faster)
- Average: 10x faster
Indexes Added: 23 Schema Changes: 8 tables improved
Cost Savings:
- Database CPU: 70% reduction
- Query costs: $500/month → $150/month
- User satisfaction: 📈
Monitoring Integration
# Alert on slow queries
def monitor_query_performance():
optimizer = QueryOptimizer(conn)
slow_queries = optimizer.find_slow_queries(min_duration_ms=5000)
if slow_queries:
# Analyze and send alert
for query, *stats in slow_queries:
analysis = optimizer.optimize_query(query)
send_alert(f"""
Slow query detected: {stats[2]:.2f}ms
Query: {query[:100]}...
Suggested optimizations:
{json.dumps(analysis['issues'], indent=2)}
""")
Lessons Learned
- AI finds hidden issues - Missed by manual review
- 10x performance gains - Real impact
- Comprehensive analysis - Indexes, JOINs, schema
- Still needs validation - Test before production
- Continuous monitoring - Catch regressions
Conclusion
AI-driven database optimization delivers real results. 10x faster queries, found missing indexes, improved schema.
Key takeaways:
- 10x average performance improvement
- Found 23 missing indexes
- Fixed N+1 query problems
- 70% reduction in database CPU
- $350/month cost savings
Use AI to optimize your database queries.