Our database queries were slow. Some took 45 seconds. Users complained. I used GPT-4 to analyze and optimize queries.

Results: 10x faster queries, found missing indexes, improved schema. Here’s how.

Table of Contents

The Problem

Slow Queries:

  • User search: 45 seconds
  • Dashboard load: 30 seconds
  • Report generation: 2 minutes
  • Database: PostgreSQL, 50GB data

AI-Powered Query Analysis

from openai import OpenAI
import sqlparse

client = OpenAI()

def analyze_query(query, explain_output=None):
    """Analyze SQL query for optimization opportunities."""
    prompt = f"""
Analyze this SQL query for performance issues:

```sql
{query}

{f”EXPLAIN output:\n{explain_output}” if explain_output else ""}

Identify:

  1. Missing indexes
  2. Inefficient JOINs
  3. N+1 query problems
  4. Full table scans
  5. Subquery optimization opportunities
  6. Query rewrite suggestions

For each issue:

  • Severity (Critical/High/Medium/Low)
  • Impact on performance
  • Suggested fix with SQL
  • Expected improvement

Format: JSON """

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}],
    temperature=0.2
)

return response.choices[0].message.content

## Example 1: Missing Index

**Original Query** (45 seconds):
```sql
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.email LIKE '%@gmail.com'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 100;

AI Analysis:

{
  "issues": [
    {
      "severity": "Critical",
      "type": "Full Table Scan",
      "description": "LIKE with leading wildcard prevents index usage",
      "impact": "Scanning all 1M users",
      "fix": "Use full-text search or trigram index",
      "sql": "CREATE INDEX idx_users_email_trgm ON users USING gin(email gin_trgm_ops);"
    },
    {
      "severity": "High",
      "type": "Missing Index",
      "description": "No index on orders.user_id for JOIN",
      "impact": "Slow JOIN operation",
      "fix": "Add index on foreign key",
      "sql": "CREATE INDEX idx_orders_user_id ON orders(user_id);"
    }
  ],
  "optimized_query": "-- Use trigram search\nSELECT u.*, COUNT(o.id) as order_count\nFROM users u\nLEFT JOIN orders o ON u.id = o.user_id\nWHERE u.email % '@gmail.com'  -- Trigram similarity\nGROUP BY u.id\nORDER BY order_count DESC\nLIMIT 100;",
  "expected_improvement": "90% faster (45s → 4s)"
}

After Optimization (4 seconds):

-- Add indexes
CREATE INDEX idx_users_email_trgm ON users USING gin(email gin_trgm_ops);
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Optimized query
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.email % '@gmail.com'
GROUP BY u.id
ORDER BY order_count DESC
LIMIT 100;

Result: 45s → 4s (91% faster)

Example 2: N+1 Query Problem

Original Code (30 seconds for 100 users):

# N+1 problem
users = User.query.all()
for user in users:
    user.orders = Order.query.filter_by(user_id=user.id).all()
    user.total_spent = sum(o.amount for o in user.orders)

AI Analysis:

{
  "issues": [
    {
      "severity": "Critical",
      "type": "N+1 Query",
      "description": "Executing 1 + N queries (1 for users, N for orders)",
      "impact": "101 queries for 100 users",
      "fix": "Use JOIN or eager loading",
      "optimized_code": "users = User.query.options(joinedload(User.orders)).all()\nfor user in users:\n    user.total_spent = sum(o.amount for o in user.orders)"
    }
  ],
  "expected_improvement": "95% faster (30s → 1.5s)"
}

After Optimization (1.5 seconds):

# Single query with JOIN
users = User.query.options(joinedload(User.orders)).all()
for user in users:
    user.total_spent = sum(o.amount for o in user.orders)

Result: 30s → 1.5s (95% faster)

Example 3: Inefficient Subquery

Original Query (2 minutes):

SELECT p.*,
       (SELECT COUNT(*) FROM reviews WHERE product_id = p.id) as review_count,
       (SELECT AVG(rating) FROM reviews WHERE product_id = p.id) as avg_rating,
       (SELECT COUNT(*) FROM orders WHERE product_id = p.id) as order_count
FROM products p
WHERE p.category = 'Electronics'
ORDER BY order_count DESC;

AI-Optimized Query (8 seconds):

SELECT p.*,
       COALESCE(r.review_count, 0) as review_count,
       COALESCE(r.avg_rating, 0) as avg_rating,
       COALESCE(o.order_count, 0) as order_count
FROM products p
LEFT JOIN (
    SELECT product_id,
           COUNT(*) as review_count,
           AVG(rating) as avg_rating
    FROM reviews
    GROUP BY product_id
) r ON p.id = r.product_id
LEFT JOIN (
    SELECT product_id,
           COUNT(*) as order_count
    FROM orders
    GROUP BY product_id
) o ON p.id = o.product_id
WHERE p.category = 'Electronics'
ORDER BY order_count DESC;

Result: 120s → 8s (93% faster)

Automated Optimization Pipeline

import psycopg2
from typing import List, Dict

class QueryOptimizer:
    def __init__(self, db_connection):
        self.conn = db_connection
        self.client = OpenAI()
    
    def find_slow_queries(self, min_duration_ms=1000):
        """Find slow queries from pg_stat_statements."""
        query = """
        SELECT query,
               calls,
               total_exec_time,
               mean_exec_time,
               max_exec_time
        FROM pg_stat_statements
        WHERE mean_exec_time > %s
        ORDER BY mean_exec_time DESC
        LIMIT 20;
        """
        
        cursor = self.conn.cursor()
        cursor.execute(query, (min_duration_ms,))
        return cursor.fetchall()
    
    def get_explain_plan(self, query):
        """Get EXPLAIN ANALYZE output."""
        cursor = self.conn.cursor()
        cursor.execute(f"EXPLAIN ANALYZE {query}")
        return '\n'.join([row[0] for row in cursor.fetchall()])
    
    def optimize_query(self, query):
        """Optimize single query."""
        # Get execution plan
        explain = self.get_explain_plan(query)
        
        # Analyze with AI
        analysis = analyze_query(query, explain)
        
        return json.loads(analysis)
    
    def generate_optimization_report(self):
        """Generate comprehensive optimization report."""
        slow_queries = self.find_slow_queries()
        
        report = "# Database Optimization Report\n\n"
        
        for query, calls, total_time, mean_time, max_time in slow_queries:
            report += f"## Query (Mean: {mean_time:.2f}ms)\n\n"
            report += f"```sql\n{query}\n```\n\n"
            
            # Optimize
            optimization = self.optimize_query(query)
            
            report += f"### Issues Found\n\n"
            for issue in optimization.get('issues', []):
                report += f"- **{issue['severity']}**: {issue['description']}\n"
                report += f"  - Fix: {issue['fix']}\n"
                report += f"  - Expected: {issue.get('expected_improvement', 'N/A')}\n\n"
            
            if 'optimized_query' in optimization:
                report += f"### Optimized Query\n\n"
                report += f"```sql\n{optimization['optimized_query']}\n```\n\n"
        
        return report

# Usage
conn = psycopg2.connect("dbname=mydb user=postgres")
optimizer = QueryOptimizer(conn)
report = optimizer.generate_optimization_report()

with open('optimization_report.md', 'w') as f:
    f.write(report)

Schema Optimization

AI-Suggested Improvements:

-- Original schema
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    product_id INTEGER,
    quantity INTEGER,
    price DECIMAL,
    created_at TIMESTAMP
);

-- AI-optimized schema
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    price DECIMAL(10,2) NOT NULL CHECK (price >= 0),
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    
    -- Add indexes
    CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id),
    CONSTRAINT fk_product FOREIGN KEY (product_id) REFERENCES products(id)
);

-- Indexes for common queries
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

-- Composite index for common filter
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Real Results

Queries Optimized: 47

Performance Improvements:

  • User search: 45s → 4s (91% faster)
  • Dashboard: 30s → 1.5s (95% faster)
  • Reports: 120s → 8s (93% faster)
  • Average: 10x faster

Indexes Added: 23 Schema Changes: 8 tables improved

Cost Savings:

  • Database CPU: 70% reduction
  • Query costs: $500/month → $150/month
  • User satisfaction: 📈

Monitoring Integration

# Alert on slow queries
def monitor_query_performance():
    optimizer = QueryOptimizer(conn)
    slow_queries = optimizer.find_slow_queries(min_duration_ms=5000)
    
    if slow_queries:
        # Analyze and send alert
        for query, *stats in slow_queries:
            analysis = optimizer.optimize_query(query)
            
            send_alert(f"""
Slow query detected: {stats[2]:.2f}ms

Query: {query[:100]}...

Suggested optimizations:
{json.dumps(analysis['issues'], indent=2)}
""")

Lessons Learned

  1. AI finds hidden issues - Missed by manual review
  2. 10x performance gains - Real impact
  3. Comprehensive analysis - Indexes, JOINs, schema
  4. Still needs validation - Test before production
  5. Continuous monitoring - Catch regressions

Conclusion

AI-driven database optimization delivers real results. 10x faster queries, found missing indexes, improved schema.

Key takeaways:

  1. 10x average performance improvement
  2. Found 23 missing indexes
  3. Fixed N+1 query problems
  4. 70% reduction in database CPU
  5. $350/month cost savings

Use AI to optimize your database queries.