Integrating AI into existing systems is challenging. I implemented 5 proven patterns serving 1M requests/day.

Here are the patterns that work in production.

Table of Contents

Pattern 1: API Wrapper

Use Case: Simple AI features

from fastapi import FastAPI
from openai import OpenAI

app = FastAPI()
client = OpenAI()

@app.post("/api/summarize")
async def summarize(text: str):
    """Simple API wrapper for AI."""
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "Summarize concisely."},
            {"role": "user", "content": text}
        ]
    )
    
    return {"summary": response.choices[0].message.content}

Pros:

  • Simple
  • Fast to implement
  • Easy to maintain

Cons:

  • No streaming
  • Limited control
  • Higher latency

Best for: Simple features, low volume

Pattern 2: Streaming

Use Case: Real-time responses

from fastapi.responses import StreamingResponse

@app.post("/api/chat/stream")
async def chat_stream(message: str):
    """Streaming AI responses."""
    async def generate():
        stream = client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": message}],
            stream=True
        )
        
        for chunk in stream:
            if chunk.choices[0].delta.content:
                yield f"data: {chunk.choices[0].delta.content}\n\n"
    
    return StreamingResponse(generate(), media_type="text/event-stream")

Pros:

  • Better UX
  • Lower perceived latency
  • Progressive rendering

Cons:

  • More complex
  • Error handling harder
  • Client complexity

Best for: Chat interfaces, long responses

Pattern 3: Batch Processing

Use Case: High volume, non-real-time

from celery import Celery

celery = Celery('tasks', broker='redis://localhost:6379')

@celery.task
def process_batch(items):
    """Process items in batch."""
    results = []
    
    for item in items:
        result = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": item}]
        )
        results.append(result.choices[0].message.content)
    
    return results

# Usage
@app.post("/api/batch")
async def submit_batch(items: list):
    """Submit batch job."""
    task = process_batch.delay(items)
    return {"task_id": task.id}

@app.get("/api/batch/{task_id}")
async def get_batch_result(task_id: str):
    """Get batch result."""
    task = process_batch.AsyncResult(task_id)
    
    if task.ready():
        return {"status": "complete", "result": task.result}
    else:
        return {"status": "processing"}

Pros:

  • High throughput
  • Cost-effective
  • Scalable

Cons:

  • Not real-time
  • Complex infrastructure
  • Monitoring needed

Best for: Data processing, analytics

Pattern 4: RAG (Retrieval Augmented Generation)

Use Case: Knowledge-based AI

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

class RAGSystem:
    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.vectorstore = Chroma(embedding_function=self.embeddings)
        self.client = OpenAI()
    
    def add_documents(self, documents):
        """Add documents to knowledge base."""
        self.vectorstore.add_texts(documents)
    
    async def query(self, question):
        """Query with RAG."""
        # Retrieve relevant documents
        docs = self.vectorstore.similarity_search(question, k=3)
        context = "\n\n".join([doc.page_content for doc in docs])
        
        # Generate answer with context
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "Answer based on the provided context."},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
            ]
        )
        
        return response.choices[0].message.content

# Usage
rag = RAGSystem()
rag.add_documents(knowledge_base)
answer = await rag.query("What is our refund policy?")

Pros:

  • Accurate answers
  • Grounded in facts
  • Reduces hallucinations

Cons:

  • Complex setup
  • Vector DB needed
  • Higher latency

Best for: Customer support, documentation

Pattern 5: AI Agent

Use Case: Complex workflows

from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI

class AIAgent:
    def __init__(self):
        self.llm = OpenAI(temperature=0)
        self.tools = self._init_tools()
        self.agent = initialize_agent(
            self.tools,
            self.llm,
            agent="zero-shot-react-description"
        )
    
    def _init_tools(self):
        """Initialize agent tools."""
        return [
            Tool(
                name="Search",
                func=self.search,
                description="Search for information"
            ),
            Tool(
                name="Calculate",
                func=self.calculate,
                description="Perform calculations"
            ),
            Tool(
                name="Database",
                func=self.query_db,
                description="Query database"
            )
        ]
    
    async def execute(self, task):
        """Execute task with agent."""
        return await self.agent.arun(task)

# Usage
agent = AIAgent()
result = await agent.execute("Find the total sales for last month and calculate growth")

Pros:

  • Handles complex tasks
  • Multi-step reasoning
  • Tool integration

Cons:

  • Most complex
  • Expensive
  • Unpredictable

Best for: Complex automation, research

Pattern Comparison

PatternComplexityCostLatencyUse Case
API WrapperLowLowMediumSimple features
StreamingMediumMediumLowChat interfaces
BatchHighLowHighData processing
RAGHighMediumMediumKnowledge base
AgentVery HighHighHighComplex tasks

Production Stats

1M requests/day across all patterns:

PatternRequests/DayAvg LatencyCost/Day
API Wrapper500K1.5s$500
Streaming300K2.0s$600
Batch150KN/A$150
RAG40K2.5s$200
Agent10K5.0s$300

Total: $1,750/day = $52,500/month

Choosing the Right Pattern

Decision Tree:

  1. Real-time needed? → Streaming or API Wrapper
  2. Complex reasoning? → Agent
  3. Knowledge base? → RAG
  4. High volume? → Batch
  5. Simple feature? → API Wrapper

Lessons Learned

  1. Start simple: API Wrapper first
  2. Add complexity as needed: Don’t over-engineer
  3. Monitor costs: Can escalate quickly
  4. Cache aggressively: 40% cost reduction
  5. Use right pattern: Don’t use agent for simple tasks

Conclusion

5 proven AI integration patterns serving 1M requests/day. Choose based on use case.

Key takeaways:

  1. 5 patterns for different use cases
  2. 1M requests/day handled
  3. $52K/month total cost
  4. Start simple, add complexity
  5. Right pattern = 50% cost savings

Use the right pattern for the job. Don’t over-engineer.