Agentic AIMCPIntegrationProduction

MCP in Production: Connecting AI Agents to Real Systems

2024-12-30 • 10 min read

Our agent needs to read from Google Drive, write to our SQL database, send Slack notifications, and query our internal API. Do we seriously have to write custom integration code for each one?

That was the old way, yeah. You'd write a custom 'Google Drive tool,' a custom 'SQL tool,' a custom 'Slack tool'... and every time the LLM API changed or you switched from OpenAI to Anthropic, you'd rewrite everything. Then Anthropic released MCP (Model Context Protocol) in late 2024. Think of it as USB-C for AI—one standard connector, infinite devices. Your agent speaks MCP, your tools speak MCP, and the framework doesn't matter. It's the missing infrastructure layer we desperately needed.

So MCP is... an API wrapper?

Not quite. MCP is a protocol—a set of rules for how AI clients (like your agent) and MCP servers (your tools) communicate. It's JSON-RPC under the hood, which means structured requests/responses, error handling, and bidirectional communication. An MCP server exposes tools (actions the AI can invoke) and resources (data the AI can read). The AI doesn't know it's talking to Slack vs. SQL—it just knows 'there's a tool called send_message' and 'a resource called customer_database.'

Okay, but how does this work in practice? Like, with actual code?

Let's start simple: Claude Desktop (Anthropic's app) natively supports MCP. You edit a config file, point it to MCP servers, restart the app, and boom—Claude can now access your local filesystem, GitHub repos, or any custom tool you build. Here's what the config looks like:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/you/Projects"]
    },
    "github": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "ghcr.io/github/github-mcp-server"],
      "env": { "GITHUB_TOKEN": "your_token" }
    }
  }
}

You save this, restart Claude Desktop, and now Claude can read files from /Users/you/Projects and interact with GitHub (list PRs, create issues, etc.). No custom code. Just config.

That's... shockingly simple. What's the catch?

The catch is you're running MCP servers locally via Stdio (standard input/output). That's fine for desktop apps, but for production agents running on servers? You need remote MCP servers over HTTP or WebSocket. Which means deploying your own MCP infrastructure. Let me walk you through it.

Alright, how do we build a production MCP server?

An MCP server has three components:

1. Tools: Functions the AI can invoke. Example: send_slack_message(channel, text). You define the signature (using Pydantic models for type safety!), and MCP auto-generates the JSON schema for the LLM.

2. Resources: Data the AI can read. Example: a 'customer_list' resource that queries your database. The AI can ask 'What are the latest customers?' and your server fetches the data.

3. Prompts (optional): Pre-defined prompt templates the AI can use. Like 'Write a professional apology email' with placeholders.

Here's a minimal Python MCP server using the official SDK:

from mcp.server import Server
from pydantic import BaseModel

server = Server("my-api-server")

class MessageInput(BaseModel):
    channel: str
    text: str

@server.call_tool()
async def send_slack_message(input: MessageInput) -> str:
    # Your Slack API logic here
    return f"Sent '{input.text}' to #{input.channel}"

if __name__ == "__main__":
    server.run()  # Runs on Stdio by default

Notice the Pydantic model? That's your type-safe validation. If the LLM tries to send {"channel": 123}, Pydantic rejects it before your code runs.

But that runs on Stdio. How do we make it a real HTTP server?

You deploy it behind an HTTP transport. MCP supports HTTP with Server-Sent Events (SSE) for streaming. You'd wrap your MCP server in a web framework (FastAPI works great), expose an endpoint, and configure your agent to connect via HTTPS:

from fastapi import FastAPI
from mcp.server.sse import SseServerTransport

app = FastAPI()
transport = SseServerTransport("/mcp")

@app.post("/mcp")
async def mcp_endpoint(request: Request):
    async with transport.connect_sse(request) as streams:
        await server.run(streams[0], streams[1])

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Now your agent connects to https://your-server.com/mcp instead of running a local process. This is how you scale MCP to production.

What about security? We're exposing tools that can write to our database!

Critical question. MCP itself doesn't handle auth—you do. Strategies:

1. API Key Authentication: Require a header like Authorization: Bearer <token> on your HTTP endpoint. Only your agent has the key.

2. OAuth2: For user-scoped access (e.g., Slack on behalf of a specific user), implement OAuth flows. The MCP server stores tokens securely.

3. Network Isolation: Run MCP servers in a private VPC. Your agent can reach them, but the public internet can't.

4. Tool-Level Permissions: Not all tools should be exposed to all agents. You can implement role-based access control (RBAC) in your MCP server: 'Agent A can only call read-only tools, Agent B can write.'

5. Audit Logging: Log every MCP call—who invoked which tool, when, with what args, and what was returned. Critical for compliance and debugging.

This is sounding complicated. Can't we just use webhooks or REST APIs like normal?

You could! But here's what MCP gives you that raw APIs don't:

1. Automatic schema generation: MCP introspects your Pydantic models and tells the LLM 'here's what this tool expects.' You don't manually write OpenAPI specs.

2. Bidirectional streaming: MCP supports real-time updates. Example: a 'monitor_pipeline' tool that streams progress updates to the agent.

3. Resource subscriptions: The agent can 'watch' a resource (like a database table) and get notified when it changes. No polling.

4. Protocol-level retries: MCP has built-in error codes and retry semantics. The agent knows when to retry vs. when to fail.

5. Unified tooling: One MCP client library works with all MCP servers. No per-API SDK hell.

REST APIs are fine, but MCP is designed specifically for agentic workflows. It's the difference between a generic HTTP library and a purpose-built framework.

The "Million Dollar" Question

"Is MCP just Anthropic lock-in? What if we don't use Claude?"

Technical Reality Check

MCP is an Open Standard, Not a Vendor Lock

MCP is open-source and vendor-neutral. The spec, SDKs, and reference implementations are all on GitHub (under MIT/Apache licenses). Here's the reality:

1. Claude Desktop is just one client. Anthropic built the first production-grade MCP client, but any agent framework can implement MCP. You can use MCP with Pydantic AI, LangGraph, or even custom code. The protocol is LLM-agnostic.

2. OpenAI doesn't support it natively... yet. OpenAI has 'function calling,' which is similar but proprietary. MCP is competing with that. If you're locked into OpenAI's ecosystem, you'll need to write a translation layer (map MCP tools to OpenAI's function schema). But that's a one-time effort.

3. Community adoption is growing. As of late 2024, there are MCP servers for GitHub, Slack, Google Drive, Postgres, MongoDB, Jira, and more. The ecosystem is young but accelerating. If MCP becomes the de facto standard (likely, given Anthropic's momentum), even OpenAI will adopt it.

4. You can self-host everything. Unlike OpenAI's hosted functions, MCP servers run on your infrastructure. No data leaves your network if you don't want it to. That's huge for compliance-heavy industries.

Bottom line: MCP is a protocol, not a product. Anthropic kickstarted it, but it's designed to outlive any single vendor.

Alright, I'm sold. What's the production deployment checklist?

Here's the playbook:

1. Start with Claude Desktop for prototyping. Get your local MCP servers working first. Validate tools, test edge cases. Once stable, migrate to remote.

2. Deploy MCP servers behind HTTPS. Use FastAPI + Uvicorn (or your web framework of choice). Terminate TLS at a load balancer. Never run MCP over plain HTTP in production.

3. Implement authentication. API keys minimum. OAuth2 if user-scoped. Store secrets in a vault (AWS Secrets Manager, HashiCorp Vault).

4. Add observability. Log every MCP call: timestamp, tool name, input args, output, latency, errors. Integrate with your logging stack (Datadog, Grafana, etc.).

5. Rate limit aggressively. Agents can loop. If an agent retries a failing tool 1000 times, you'll hit API quotas or overwhelm your DB. Implement per-agent rate limits (e.g., 10 requests/second).

6. Test failure modes. What happens when your MCP server is down? Your agent should degrade gracefully (return cached data, notify humans, etc.), not crash.

7. Version your MCP servers. As you add/remove tools, old agents might break. Use API versioning (/v1/mcp, /v2/mcp) so you can sunset old versions safely.

Last question: How do MCP servers integrate with our vector database for retrieval agents?

Great connection to our RAG series! You'd build an MCP server that exposes retrieval tools. Example:

@server.call_tool()
async def search_documents(query: str, top_k: int = 5) -> list[str]:
    # Your vector search logic (Qdrant, pgvector, etc.)
    results = await vector_db.search(query, limit=top_k)
    return [doc.content for doc in results]

Now your agent can call search_documents('contract pricing') and get relevant chunks from your RAG system. The beauty? MCP decouples the agent from the vector DB. You can swap Qdrant for pgvector without touching the agent—just update the MCP server. For more on vector database selection, see our Vector Database Showdown post.

Technical Reality Check

What MCP Does NOT Give You

1. It's not a deployment platform. MCP is a protocol. YOU deploy the servers (Docker, Kubernetes, serverless, whatever). MCP doesn't host anything.

2. It's not an auth provider. MCP defines how to pass auth tokens, but not what auth system to use. OAuth, API keys, SAML—that's your choice.

3. It's not a message queue. If your tools take 10 minutes to run, MCP won't manage that. You'll need a job queue (Celery, RabbitMQ) behind your MCP server.

4. It's not a load balancer. If you have 100 agents hitting one MCP server, YOU need to scale horizontally (multiple server instances behind a load balancer).

5. It's not magic. MCP makes integration easier. It doesn't make your tools faster, smarter, or more reliable. If your SQL query is slow, MCP won't fix it.

Bottom line: MCP is infrastructure glue. It standardizes how agents talk to tools, but you still architect the tools, deploy the servers, and handle failures. It's powerful, but it's not a silver bullet. Combine it with Pydantic AI's type validation, and you've got a rock-solid agentic stack.

← Back to Agentic AI Engineering Series

← PreviousAgent Framework Showdown: Pydantic AI vs. LangChain Stack Next →Agent Orchestration Patterns: Decision Graphs & State Machines

← Back to Home