Building Production AI Agents: From Concept to Deployment

AI agents are powerful but complex. I built 5 production agents serving 50K users/day. Learned what works and what doesn’t.

Here’s the complete guide from concept to deployment.

What is an AI Agent?

Definition: Autonomous system that:

Perceives environment
Makes decisions
Takes actions
Learns from results

Example:

User: "Book me a flight to Tokyo next week"

Agent:
1. Searches flights (action)
2. Compares prices (reasoning)
3. Checks calendar (tool use)
4. Books best option (decision)
5. Sends confirmation (communication)

Agent Architecture

from langchain.agents import initialize_agent, Tool
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI

class ProductionAgent:
    def __init__(self, name, tools, memory_type="buffer"):
        self.name = name
        self.llm = OpenAI(temperature=0.7)
        self.tools = tools
        self.memory = self._init_memory(memory_type)
        self.agent = self._create_agent()
    
    def _init_memory(self, memory_type):
        """Initialize agent memory."""
        if memory_type == "buffer":
            return ConversationBufferMemory(
                memory_key="chat_history",
                return_messages=True
            )
        # Add other memory types
    
    def _create_agent(self):
        """Create the agent."""
        return initialize_agent(
            tools=self.tools,
            llm=self.llm,
            agent="chat-conversational-react-description",
            memory=self.memory,
            verbose=True,
            max_iterations=5,
            early_stopping_method="generate"
        )
    
    def run(self, task):
        """Execute task with error handling."""
        try:
            return self.agent.run(task)
        except Exception as e:
            return self._handle_error(e)
    
    def _handle_error(self, error):
        """Handle agent errors gracefully."""
        error_msg = str(error)
        
        if "rate limit" in error_msg.lower():
            return "I'm experiencing high demand. Please try again in a moment."
        elif "timeout" in error_msg.lower():
            return "This is taking longer than expected. Let me try a simpler approach."
        else:
            return f"I encountered an issue: {error_msg}. Let me try differently."

Essential Tools

1. Web Search

from langchain.tools import DuckDuckGoSearchRun

search_tool = Tool(
    name="WebSearch",
    func=DuckDuckGoSearchRun().run,
    description="Search the web for current information. Use for facts, news, or recent events."
)

2. Calculator

from langchain.tools import Tool
import numexpr

def calculate(expression):
    """Safely evaluate mathematical expressions."""
    try:
        return numexpr.evaluate(expression).item()
    except:
        return "Invalid expression"

calculator_tool = Tool(
    name="Calculator",
    func=calculate,
    description="Perform mathematical calculations. Input should be a valid mathematical expression."
)

3. Database Query

def query_database(query):
    """Query database with natural language."""
    # Convert natural language to SQL
    sql = llm.predict(f"Convert to SQL: {query}")
    
    # Execute safely
    result = db.execute(sql)
    return result

db_tool = Tool(
    name="DatabaseQuery",
    func=query_database,
    description="Query the database. Use for retrieving stored information."
)

4. File Operations

def file_operations(operation):
    """Handle file read/write operations."""
    import json
    
    op = json.loads(operation)
    
    if op['action'] == 'read':
        with open(op['file'], 'r') as f:
            return f.read()
    elif op['action'] == 'write':
        with open(op['file'], 'w') as f:
            f.write(op['content'])
        return "File written successfully"

file_tool = Tool(
    name="FileOperations",
    func=file_operations,
    description='Read or write files. Input: {"action": "read|write", "file": "path", "content": "..."}'
)

Memory Systems

Short-term Memory

from langchain.memory import ConversationBufferMemory

# Stores recent conversation
short_term = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    max_token_limit=2000  # Limit memory size
)

Long-term Memory

from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores import FAISS

# Stores all conversations, retrieves relevant ones
vectorstore = FAISS.from_texts([""], embedding=embeddings)

long_term = VectorStoreRetrieverMemory(
    retriever=vectorstore.as_retriever(search_kwargs=dict(k=5)),
    memory_key="long_term_memory"
)

Combined Memory

from langchain.memory import CombinedMemory

memory = CombinedMemory(memories=[short_term, long_term])

Real Agent Examples

Example 1: Customer Support Agent

# Tools
tools = [
    Tool(
        name="SearchFAQ",
        func=search_faq,
        description="Search FAQ database for answers"
    ),
    Tool(
        name="CheckOrderStatus",
        func=check_order,
        description="Check order status. Input: order_id"
    ),
    Tool(
        name="CreateTicket",
        func=create_ticket,
        description="Create support ticket. Input: {issue, priority}"
    )
]

# Create agent
support_agent = ProductionAgent(
    name="SupportBot",
    tools=tools,
    memory_type="combined"
)

# Usage
response = support_agent.run("""
Customer: My order #12345 hasn't arrived yet.
It's been 2 weeks.
""")

# Agent will:
# 1. Check order status
# 2. Search FAQ for shipping info
# 3. Create ticket if needed
# 4. Respond to customer

Example 2: Data Analysis Agent

tools = [
    Tool(
        name="QueryDatabase",
        func=query_db,
        description="Query sales database"
    ),
    Tool(
        name="GenerateChart",
        func=create_chart,
        description="Create visualization. Input: {data, chart_type}"
    ),
    Tool(
        name="CalculateMetrics",
        func=calculate_metrics,
        description="Calculate business metrics"
    )
]

analyst_agent = ProductionAgent(
    name="DataAnalyst",
    tools=tools
)

# Usage
report = analyst_agent.run("""
Analyze last quarter's sales performance.
Compare to previous quarter.
Generate charts and key insights.
""")

Example 3: Code Review Agent

tools = [
    Tool(
        name="AnalyzeCode",
        func=analyze_code_quality,
        description="Analyze code for issues"
    ),
    Tool(
        name="RunTests",
        func=run_tests,
        description="Execute test suite"
    ),
    Tool(
        name="CheckSecurity",
        func=security_scan,
        description="Scan for security vulnerabilities"
    ),
    Tool(
        name="SuggestImprovements",
        func=suggest_improvements,
        description="Suggest code improvements"
    )
]

code_agent = ProductionAgent(
    name="CodeReviewer",
    tools=tools
)

# Usage
review = code_agent.run("""
Review pull request #123.
Check code quality, run tests, scan for security issues.
Provide detailed feedback.
""")

Error Handling

class RobustAgent(ProductionAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_retries = 3
        self.fallback_responses = {
            'search_failed': "I couldn't find that information. Could you rephrase?",
            'tool_error': "I encountered an issue with that tool. Let me try another approach.",
            'timeout': "This is taking too long. Let me simplify the task."
        }
    
    def run_with_retry(self, task):
        """Run with automatic retry."""
        for attempt in range(self.max_retries):
            try:
                return self.agent.run(task)
            except RateLimitError:
                if attempt < self.max_retries - 1:
                    time.sleep(2 ** attempt)  # Exponential backoff
                    continue
                return self.fallback_responses['search_failed']
            except ToolException as e:
                return self.fallback_responses['tool_error']
            except TimeoutError:
                return self.fallback_responses['timeout']
        
        return "I'm having trouble completing this task. Please try again later."

Monitoring and Logging

import logging
from prometheus_client import Counter, Histogram

# Metrics
agent_requests = Counter('agent_requests_total', 'Total agent requests', ['agent_name', 'status'])
agent_latency = Histogram('agent_latency_seconds', 'Agent response time', ['agent_name'])
tool_usage = Counter('tool_usage_total', 'Tool usage count', ['tool_name'])

class MonitoredAgent(RobustAgent):
    def run(self, task):
        """Run with monitoring."""
        start_time = time.time()
        
        try:
            result = super().run(task)
            
            # Record success
            agent_requests.labels(agent_name=self.name, status='success').inc()
            agent_latency.labels(agent_name=self.name).observe(time.time() - start_time)
            
            # Log
            logging.info(f"Agent {self.name} completed task in {time.time() - start_time:.2f}s")
            
            return result
        
        except Exception as e:
            # Record failure
            agent_requests.labels(agent_name=self.name, status='error').inc()
            logging.error(f"Agent {self.name} failed: {e}")
            raise

Deployment

Docker Container

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["python", "agent_server.py"]

FastAPI Server

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI()

# Initialize agents
support_agent = MonitoredAgent(name="Support", tools=support_tools)
analyst_agent = MonitoredAgent(name="Analyst", tools=analyst_tools)

class AgentRequest(BaseModel):
    agent: str
    task: str

@app.post("/agent/run")
async def run_agent(request: AgentRequest):
    """Execute agent task."""
    agents = {
        'support': support_agent,
        'analyst': analyst_agent
    }
    
    agent = agents.get(request.agent)
    if not agent:
        raise HTTPException(status_code=404, detail="Agent not found")
    
    try:
        result = agent.run(request.task)
        return {"result": result, "status": "success"}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/metrics")
async def metrics():
    """Agent metrics."""
    return {
        'total_requests': agent_requests._value.sum(),
        'avg_latency': agent_latency._sum.sum() / agent_latency._count.sum()
    }

Real Results

5 Agents Deployed:

Customer Support (30K requests/day)
Data Analyst (10K requests/day)
Code Reviewer (5K requests/day)
Content Moderator (3K requests/day)
Research Assistant (2K requests/day)

Performance:

Average latency: 2.5s
Success rate: 94%
User satisfaction: 4.6/5
Cost: $800/day

Impact:

Support tickets: 60% reduction
Analysis time: 80% faster
Code review: 70% faster
Moderation: 90% automated

Best Practices

Start simple: One tool, basic memory
Add tools gradually: Test each addition
Monitor everything: Latency, errors, costs
Handle errors gracefully: Fallback responses
Limit iterations: Prevent infinite loops
Cache results: Reduce API calls
Test thoroughly: Edge cases matter
Document behavior: For debugging

Lessons Learned

Agents fail: Plan for it
Tools are critical: Quality > quantity
Memory matters: But can be expensive
Monitoring essential: Know what’s happening
Iteration limits: Prevent runaway costs

Conclusion

Building production AI agents requires careful architecture, robust error handling, and comprehensive monitoring.

Key takeaways:

Start simple, add complexity gradually
Error handling is critical (94% success rate)
Monitoring prevents surprises
Tools make or break agents
Real impact: 60-90% automation

Build agents that work. Not just demos.

Table of Contents