MongoDB Aggregation Pipeline: Complex Queries Made Simple
Our MongoDB queries were slow and inefficient. Multiple round trips, client-side processing, 30s for analytics.
Used aggregation pipeline. Query time 30s → 2s, server-side processing, complex analytics easy.
Table of Contents
Before: Multiple Queries
// Slow: Multiple queries + client-side processing
const users = await db.collection('users').find({}).toArray();
const orders = await db.collection('orders').find({}).toArray();
// Client-side join and aggregation
const result = users.map(user => {
const userOrders = orders.filter(o => o.userId === user._id);
return {
user: user.name,
totalOrders: userOrders.length,
totalSpent: userOrders.reduce((sum, o) => sum + o.total, 0)
};
});
// Time: 30s for 100K users
After: Aggregation Pipeline
const result = await db.collection('users').aggregate([
// Lookup orders
{
$lookup: {
from: 'orders',
localField: '_id',
foreignField: 'userId',
as: 'orders'
}
},
// Calculate totals
{
$project: {
name: 1,
totalOrders: { $size: '$orders' },
totalSpent: { $sum: '$orders.total' }
}
},
// Sort by total spent
{
$sort: { totalSpent: -1 }
},
// Limit results
{
$limit: 100
}
]).toArray();
// Time: 2s for 100K users
Complex Analytics
// Sales analytics by category and month
const salesAnalytics = await db.collection('orders').aggregate([
// Match date range
{
$match: {
createdAt: {
$gte: new Date('2020-01-01'),
$lt: new Date('2021-01-01')
}
}
},
// Unwind items
{
$unwind: '$items'
},
// Lookup product details
{
$lookup: {
from: 'products',
localField: 'items.productId',
foreignField: '_id',
as: 'product'
}
},
{
$unwind: '$product'
},
// Group by category and month
{
$group: {
_id: {
category: '$product.category',
month: { $month: '$createdAt' }
},
totalSales: { $sum: { $multiply: ['$items.quantity', '$items.price'] } },
totalOrders: { $sum: 1 },
avgOrderValue: { $avg: { $multiply: ['$items.quantity', '$items.price'] } }
}
},
// Sort
{
$sort: { '_id.month': 1, totalSales: -1 }
}
]).toArray();
Faceted Search
const facetedSearch = await db.collection('products').aggregate([
// Match search query
{
$match: {
$text: { $search: 'laptop' }
}
},
// Facet by multiple dimensions
{
$facet: {
// Products
products: [
{ $limit: 20 },
{ $project: { name: 1, price: 1, category: 1 } }
],
// Price ranges
priceRanges: [
{
$bucket: {
groupBy: '$price',
boundaries: [0, 500, 1000, 2000, 5000],
default: '5000+',
output: {
count: { $sum: 1 },
avgPrice: { $avg: '$price' }
}
}
}
],
// Categories
categories: [
{
$group: {
_id: '$category',
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
],
// Total count
totalCount: [
{ $count: 'count' }
]
}
}
]).toArray();
Time Series Analysis
const timeSeriesData = await db.collection('metrics').aggregate([
// Match time range
{
$match: {
timestamp: {
$gte: new Date('2020-01-01'),
$lt: new Date('2020-02-01')
}
}
},
// Group by hour
{
$group: {
_id: {
$dateToString: {
format: '%Y-%m-%d %H:00',
date: '$timestamp'
}
},
avgValue: { $avg: '$value' },
minValue: { $min: '$value' },
maxValue: { $max: '$value' },
count: { $sum: 1 }
}
},
// Sort by time
{
$sort: { _id: 1 }
}
]).toArray();
Geospatial Queries
// Find nearby stores
const nearbyStores = await db.collection('stores').aggregate([
{
$geoNear: {
near: {
type: 'Point',
coordinates: [-73.9857, 40.7484] // NYC
},
distanceField: 'distance',
maxDistance: 5000, // 5km
spherical: true
}
},
// Lookup store details
{
$lookup: {
from: 'storeDetails',
localField: '_id',
foreignField: 'storeId',
as: 'details'
}
},
{
$unwind: '$details'
},
// Project fields
{
$project: {
name: '$details.name',
address: '$details.address',
distance: 1,
distanceKm: { $divide: ['$distance', 1000] }
}
},
{
$limit: 10
}
]).toArray();
Performance Optimization
// Create indexes
await db.collection('orders').createIndex({ userId: 1, createdAt: -1 });
await db.collection('products').createIndex({ category: 1, price: 1 });
await db.collection('stores').createIndex({ location: '2dsphere' });
// Use explain to analyze
const explain = await db.collection('users').aggregate([
// ... pipeline
]).explain('executionStats');
console.log('Execution time:', explain.executionStats.executionTimeMillis);
console.log('Documents examined:', explain.executionStats.totalDocsExamined);
Results
Performance:
| Query Type | Before | After | Improvement |
|---|---|---|---|
| User analytics | 30s | 2s | 93% |
| Sales report | 45s | 3s | 93% |
| Faceted search | 10s | 500ms | 95% |
| Time series | 20s | 1s | 95% |
Capabilities:
- Complex joins: ✅
- Server-side processing: ✅
- Geospatial queries: ✅
- Faceted search: ✅
Lessons Learned
- Aggregation pipeline powerful: 93% faster
- Server-side processing better: Less network
- Indexes critical: 10x improvement
- $lookup for joins: No client-side joins
- Explain helps: Optimize queries
Conclusion
MongoDB aggregation pipeline transformed our analytics. Query time 30s → 2s, 93% faster, complex queries easy.
Key takeaways:
- Query time: 30s → 2s (-93%)
- Server-side processing
- Complex analytics possible
- Indexes critical
- No client-side joins needed
Use aggregation pipeline. MongoDB is powerful.