Building MCP Servers in Production: Python Implementation Guide (2026)

If you already know what the Model Context Protocol is — that it’s an open standard for connecting LLMs to external tools, data sources, and capabilities — then this guide skips the theory and gets straight to implementation. By the end, you’ll have built a working MCP server with tools, resources, and prompts, know how to test it locally, connect it to Claude Desktop and the Claude Agent SDK, and deploy it to production.

All code examples are in Python using the official mcp SDK.


1. MCP Architecture: Transport Layers

Every MCP server communicates with clients over one of three transport mechanisms. Picking the right one shapes the rest of your architecture.

stdio

The server runs as a child process. The client writes JSON-RPC messages to stdin, reads responses from stdout. This is the simplest transport and the right choice for local tools — no networking, no auth at the transport layer, zero configuration overhead.

When to use it: Developer tools, local integrations, Claude Desktop extensions, scripts that run on the same machine as the client.

Critical gotcha: For stdio servers, never write to stdout. print() in Python writes to stdout by default and will corrupt the JSON-RPC stream and silently break your server. Always write logs to stderr:

import sys
import logging

# ❌ Breaks the server
print("debug message")

# ✅ Safe
print("debug message", file=sys.stderr)

# ✅ Also safe — logging defaults to stderr
logging.basicConfig(level=logging.DEBUG)
logging.info("debug message")

SSE (Server-Sent Events)

The client connects to an HTTP endpoint; the server streams events back. This transport is being deprecated in favor of Streamable HTTP for new implementations. If you’re inheriting an SSE-based server, it still works — but build new servers with HTTP.

Streamable HTTP (HTTP+SSE)

The current preferred transport for remote servers. The client makes POST requests to a single endpoint; the server can respond with either a direct JSON response or stream events over SSE. This gives you the flexibility to handle simple request-response tools and long-running streaming operations through the same endpoint.

When to use it: Any server that needs to be reachable over the network — multi-tenant platforms, cloud deployments, servers shared across multiple clients.

The transport decision has a direct impact on your auth story: stdio servers rely on OS-level process isolation, while HTTP servers need explicit authentication.


2. Building an MCP Server: Tools, Resources, and Prompts

Let’s build a realistic server that demonstrates all three MCP primitives. We’ll build a code-review server that can fetch files from a repository, run lint checks, and provide review prompt templates.

Setup

# Requires Python 3.10+
uv init code-review-server
cd code-review-server
uv venv
source .venv/bin/activate
uv add "mcp[cli]" httpx

The Complete Server

# server.py
import asyncio
import logging
import subprocess
import sys
from pathlib import Path
from typing import Any

import httpx
from mcp.server.fastmcp import FastMCP
from mcp.types import Resource, TextContent

logging.basicConfig(
    level=logging.INFO,
    stream=sys.stderr,
    format="%(asctime)s %(levelname)s %(message)s",
)
logger = logging.getLogger(__name__)

mcp = FastMCP(
    "code-review",
    version="1.0.0",
    description="Code review tools and resources",
)

# ---------------------------------------------------------------------------
# Tools — functions the LLM can call
# ---------------------------------------------------------------------------

@mcp.tool()
async def run_linter(file_path: str, linter: str = "ruff") -> str:
    """Run a linter on a Python file and return the output.

    Args:
        file_path: Absolute path to the Python file to lint
        linter: Linting tool to use ('ruff' or 'flake8')
    """
    path = Path(file_path)
    if not path.exists():
        return f"Error: file not found: {file_path}"
    if not path.suffix == ".py":
        return f"Error: only Python files are supported, got {path.suffix}"

    allowed_linters = {"ruff", "flake8"}
    if linter not in allowed_linters:
        return f"Error: unsupported linter '{linter}'. Choose from: {allowed_linters}"

    try:
        result = subprocess.run(
            [linter, str(path)],
            capture_output=True,
            text=True,
            timeout=30,
        )
        output = result.stdout + result.stderr
        return output if output else "No issues found."
    except FileNotFoundError:
        return f"Error: {linter} is not installed. Run: pip install {linter}"
    except subprocess.TimeoutExpired:
        return f"Error: linter timed out after 30 seconds"


@mcp.tool()
async def fetch_github_file(owner: str, repo: str, path: str, ref: str = "main") -> str:
    """Fetch the raw content of a file from a public GitHub repository.

    Args:
        owner: Repository owner (e.g. 'modelcontextprotocol')
        repo: Repository name (e.g. 'python-sdk')
        path: File path within the repository (e.g. 'src/mcp/server/fastmcp.py')
        ref: Branch, tag, or commit SHA (default: 'main')
    """
    url = f"https://raw.githubusercontent.com/{owner}/{repo}/{ref}/{path}"
    logger.info(f"Fetching: {url}")

    async with httpx.AsyncClient() as client:
        try:
            response = await client.get(url, timeout=15.0)
            response.raise_for_status()
            return response.text
        except httpx.HTTPStatusError as e:
            return f"Error: HTTP {e.response.status_code} for {url}"
        except httpx.RequestError as e:
            return f"Error fetching file: {str(e)}"


@mcp.tool()
async def list_directory(dir_path: str) -> str:
    """List the contents of a local directory.

    Args:
        dir_path: Absolute path to the directory
    """
    path = Path(dir_path)
    if not path.exists():
        return f"Error: directory not found: {dir_path}"
    if not path.is_dir():
        return f"Error: not a directory: {dir_path}"

    entries = []
    for entry in sorted(path.iterdir()):
        prefix = "📁" if entry.is_dir() else "📄"
        entries.append(f"{prefix} {entry.name}")

    return "\n".join(entries) if entries else "(empty directory)"


# ---------------------------------------------------------------------------
# Resources — file-like data the client can read
# ---------------------------------------------------------------------------

@mcp.resource("file://{path}")
async def read_local_file(path: str) -> str:
    """Read a local file by path."""
    file_path = Path(path)
    if not file_path.exists():
        raise FileNotFoundError(f"File not found: {path}")
    return file_path.read_text()


@mcp.resource("config://review-rules")
async def get_review_rules() -> str:
    """Returns the code review rules configuration."""
    return """
# Code Review Rules

## Required Checks
- All functions must have docstrings
- No bare `except` clauses
- No mutable default arguments
- f-strings preferred over .format() or % formatting
- Type hints required for public functions

## Style
- Max line length: 100 characters
- Imports ordered: stdlib, third-party, local
- No unused imports
"""


# ---------------------------------------------------------------------------
# Prompts — reusable templates for common review workflows
# ---------------------------------------------------------------------------

@mcp.prompt()
def security_review(file_path: str, context: str = "") -> str:
    """Generate a security-focused code review prompt.

    Args:
        file_path: Path to the file being reviewed
        context: Optional additional context about the code
    """
    return f"""You are a security-focused code reviewer. Review the file at {file_path}.

{'Context: ' + context if context else ''}

Focus on:
1. Input validation and sanitization — are all external inputs validated?
2. Injection vulnerabilities — SQL, command, path traversal
3. Authentication and authorization — are sensitive operations protected?
4. Secrets and credentials — hardcoded values, insecure storage
5. Error handling — do errors leak sensitive information?

For each issue found, provide:
- Severity: Critical / High / Medium / Low
- Location: line number or function name
- Description: what the vulnerability is
- Remediation: specific code-level fix

Use the read_local_file resource to read the file contents first.
"""


@mcp.prompt()
def performance_review(file_path: str) -> str:
    """Generate a performance-focused code review prompt.

    Args:
        file_path: Path to the file being reviewed
    """
    return f"""Review {file_path} for performance issues.

Check for:
- N+1 query patterns
- Unnecessary loops or nested iterations
- Missing async/await for I/O-bound operations
- Large in-memory data structures that could be streamed
- Repeated computations that could be cached

Use the read_local_file resource to read the file.
"""


# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------

def main():
    mcp.run(transport="stdio")


if __name__ == "__main__":
    main()

What’s happening here

Tools are Python functions decorated with @mcp.tool(). FastMCP automatically generates the JSON Schema for tool arguments from type hints and the docstring. The LLM sees the function signature and description — write them as if you’re documenting a public API.

Resources are read-only data sources identified by URI. The file://{path} pattern makes the URI template dynamic — the {path} segment becomes a parameter. config://review-rules is a static URI for configuration that doesn’t change.

Prompts are reusable message templates. They show up in Claude’s prompt palette and can be invoked by name. Think of them as macros: composable starting points for specific workflows rather than one-off instructions.


3. Authentication and Security

stdio servers

stdio transport inherits OS-level security. The client process spawns the server process — no network exposure, no tokens needed. Credentials the server needs (API keys, database passwords) get passed through environment variables in the client’s server config:

{
  "mcpServers": {
    "code-review": {
      "command": "uv",
      "args": ["--directory", "/path/to/code-review-server", "run", "server.py"],
      "env": {
        "GITHUB_TOKEN": "ghp_xxxxxxxxxxxx",
        "DATABASE_URL": "postgresql://..."
      }
    }
  }
}

Inside the server, read credentials from the environment — never hardcode them:

import os

GITHUB_TOKEN = os.environ.get("GITHUB_TOKEN")
if not GITHUB_TOKEN:
    logger.warning("GITHUB_TOKEN not set — GitHub API calls may be rate-limited")

HTTP servers: OAuth 2.1

For remote HTTP-based servers, the MCP spec standardized on OAuth 2.1 as of early 2025. The server exposes a protected resource metadata endpoint, which clients use to discover how to authenticate.

A minimal OAuth-protected FastMCP HTTP server using bearer tokens:

import os
from functools import wraps
from mcp.server.fastmcp import FastMCP
import jwt  # pip install PyJWT

mcp = FastMCP("secure-code-review")

def require_auth(func):
    """Decorator to validate bearer tokens on tool calls."""
    @wraps(func)
    async def wrapper(*args, **kwargs):
        # In production, extract token from request context
        # FastMCP's HTTP transport passes headers through the lifespan context
        token = os.environ.get("_MCP_REQUEST_TOKEN")
        if not token:
            raise PermissionError("Authentication required")

        try:
            payload = jwt.decode(
                token,
                os.environ["JWT_PUBLIC_KEY"],
                algorithms=["RS256"],
                audience="code-review-api",
            )
            # Check required scopes
            if "tools:execute" not in payload.get("scope", "").split():
                raise PermissionError("Insufficient scope")
        except jwt.InvalidTokenError as e:
            raise PermissionError(f"Invalid token: {e}")

        return await func(*args, **kwargs)
    return wrapper

For most teams, the practical approach is simpler: run your MCP server behind an API gateway (AWS API Gateway, Nginx, Cloudflare) that handles auth, and let the server trust requests that make it through.

Input validation

Never trust tool arguments. Validate at the boundary:

@mcp.tool()
async def read_file(file_path: str) -> str:
    """Read a file from the project directory."""
    base = Path("/workspace/project").resolve()
    requested = (base / file_path).resolve()

    # Prevent path traversal
    if not requested.is_relative_to(base):
        raise ValueError(f"Access denied: {file_path} is outside the project directory")

    if not requested.exists():
        raise FileNotFoundError(f"File not found: {file_path}")

    return requested.read_text()

Additional security checklist


4. Testing MCP Servers Locally

MCP Inspector

MCP Inspector is the official browser-based UI for testing MCP servers. It lets you browse your server’s tools, resources, and prompts, execute them with custom inputs, and inspect raw JSON-RPC messages.

# Start Inspector connected to your server
npx @modelcontextprotocol/inspector uv run server.py

# Inspector runs on http://127.0.0.1:6274
# A session token is printed to the console — you'll need it to authenticate

In the Inspector UI:

  1. Select “stdio” as the transport type
  2. Enter your server command in the command field
  3. Navigate to Tools, Resources, or Prompts tabs
  4. Select an item and click “Run” to execute it

The Inspector shows both the input you sent and the raw response, making it easy to debug schema mismatches and unexpected errors.

Testing with curl (HTTP transport)

If you’re building an HTTP server, test it directly:

# Start the server in HTTP mode
uv run server.py --transport http --port 8000

# Initialize the session
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{}}}'

# List available tools
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'

# Call a tool
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"list_directory","arguments":{"dir_path":"/tmp"}}}'

Unit testing tools in isolation

Tools are plain Python functions — test them without starting the full server:

# tests/test_tools.py
import pytest
from server import run_linter, list_directory
import tempfile
import os


@pytest.mark.asyncio
async def test_run_linter_on_clean_file():
    with tempfile.NamedTemporaryFile(suffix=".py", mode="w", delete=False) as f:
        f.write('x = 1\n')
        f.flush()
        result = await run_linter(f.name, linter="ruff")
    os.unlink(f.name)
    assert "No issues found" in result or result.strip() == ""


@pytest.mark.asyncio
async def test_list_directory_returns_entries():
    result = await list_directory("/tmp")
    assert isinstance(result, str)
    assert len(result) > 0


@pytest.mark.asyncio
async def test_path_traversal_blocked():
    from server import read_file  # if read_file validates paths
    with pytest.raises((ValueError, PermissionError)):
        await read_file("../../etc/passwd")

Run with:

uv add pytest pytest-asyncio --dev
uv run pytest tests/ -v

5. Connecting to Claude Desktop, Claude Code, and the Claude Agent SDK

Claude Desktop

Claude Desktop loads MCP server configs from a JSON file:

{
  "mcpServers": {
    "code-review": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/code-review-server",
        "run",
        "server.py"
      ],
      "env": {
        "GITHUB_TOKEN": "ghp_xxxxxxxxxxxx"
      }
    }
  }
}

After saving, fully quit and relaunch Claude Desktop. A hammer icon in the chat interface confirms the server connected. If it doesn’t appear, check Claude Desktop logs:

# macOS
tail -f ~/Library/Logs/Claude/mcp*.log

Claude Code

Claude Code reads server configs from .mcp.json at the project root:

{
  "mcpServers": {
    "code-review": {
      "command": "uv",
      "args": ["--directory", "/path/to/code-review-server", "run", "server.py"],
      "env": {
        "GITHUB_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}

The ${GITHUB_TOKEN} syntax expands from your shell environment at runtime. You can also configure servers globally via Claude Code settings — useful for servers you want available in every project.

Verify the connection inside Claude Code:

/mcp  # lists connected servers and their tools

Claude Agent SDK

The Agent SDK connects to MCP servers programmatically. Here’s a full example using our code-review server:

# run_review.py
import asyncio
import os
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage, SystemMessage


async def review_file(file_path: str):
    options = ClaudeAgentOptions(
        mcp_servers={
            "code-review": {
                "command": "uv",
                "args": [
                    "--directory",
                    "/path/to/code-review-server",
                    "run",
                    "server.py",
                ],
                "env": {
                    "GITHUB_TOKEN": os.environ.get("GITHUB_TOKEN", ""),
                },
            }
        },
        allowed_tools=["mcp__code-review__*"],  # wildcard: allow all tools from this server
    )

    prompt = f"Run ruff on {file_path}, then give me a security review of the file."

    async for message in query(prompt=prompt, options=options):
        # Check connection status on startup
        if isinstance(message, SystemMessage) and message.subtype == "init":
            servers = message.data.get("mcp_servers", [])
            for server in servers:
                status = server.get("status", "unknown")
                name = server.get("name", "")
                if status != "connected":
                    print(f"Warning: {name} failed to connect (status: {status})")

        if isinstance(message, ResultMessage) and message.subtype == "success":
            print(message.result)


asyncio.run(review_file("/workspace/project/auth.py"))

For remote HTTP servers, swap the server config:

options = ClaudeAgentOptions(
    mcp_servers={
        "code-review": {
            "type": "http",  # or "sse" for SSE transport
            "url": "https://your-mcp-server.com/mcp",
            "headers": {
                "Authorization": f"Bearer {os.environ['API_TOKEN']}"
            },
        }
    },
    allowed_tools=["mcp__code-review__*"],
)

Tool naming convention: MCP tools in the SDK follow the pattern mcp__<server-name>__<tool-name>. Our run_linter tool registered on a server named "code-review" becomes mcp__code-review__run_linter. Wildcards (mcp__code-review__*) grant access to all tools from a server.


6. Production Deployment

Choosing a transport for production

For production deployments serving multiple clients, Streamable HTTP is the right choice. It enables horizontal scaling, standard load balancing, and proper observability. Reserve stdio for local developer tools.

Containerizing your server

A minimal, production-ready Dockerfile:

# Dockerfile
FROM python:3.12-slim AS builder

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv

WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev

COPY . .

FROM python:3.12-slim AS runtime

# Non-root user for security
RUN addgroup --system mcp && adduser --system --group mcp
USER mcp

WORKDIR /app
COPY --from=builder /app /app
COPY --from=builder /usr/local/bin/uv /usr/local/bin/uv

# Health check endpoint (add a /health route to your server)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

EXPOSE 8000
CMD ["uv", "run", "server.py", "--transport", "http", "--port", "8000", "--host", "0.0.0.0"]

Build and push:

docker build -t code-review-mcp:latest .
docker push your-registry/code-review-mcp:latest

Running as an HTTP server

FastMCP supports HTTP transport with minimal changes:

# server.py
import argparse

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--transport", default="stdio", choices=["stdio", "http"])
    parser.add_argument("--port", type=int, default=8000)
    parser.add_argument("--host", default="127.0.0.1")
    args = parser.parse_args()

    if args.transport == "http":
        mcp.run(transport="http", host=args.host, port=args.port)
    else:
        mcp.run(transport="stdio")

Deployment options

Single server (simplest): Run the container on a VM behind a reverse proxy. Nginx handles TLS termination:

server {
    listen 443 ssl;
    server_name mcp.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/mcp.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/mcp.yourdomain.com/privkey.pem;

    location /mcp {
        proxy_pass http://localhost:8000/mcp;
        proxy_http_version 1.1;
        proxy_set_header Connection '';   # required for SSE
        proxy_buffering off;              # required for SSE streaming
        proxy_cache off;
        proxy_read_timeout 300s;
    }
}

Kubernetes: For high-availability deployments, a standard Deployment with a Service works well:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: code-review-mcp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: code-review-mcp
  template:
    metadata:
      labels:
        app: code-review-mcp
    spec:
      containers:
        - name: server
          image: your-registry/code-review-mcp:latest
          ports:
            - containerPort: 8000
          env:
            - name: GITHUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: mcp-secrets
                  key: github-token
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 2
            periodSeconds: 10
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"

Managed platforms: Render, Railway, and Northflank all support container deployments with automatic TLS, making them a fast path from container to public HTTPS endpoint.

Stateless design is essential for scaling

If you run multiple server instances behind a load balancer, stateful designs break — a client might hit instance A for one request and instance B for the next. Design your server to be stateless:

Secret management

Never bake secrets into container images or commit them to source control. Use:

Observability

Add structured logging with request IDs so you can trace tool calls through your stack:

import uuid
import logging
import sys

logging.basicConfig(
    level=logging.INFO,
    stream=sys.stderr,
    format='{"time":"%(asctime)s","level":"%(levelname)s","msg":"%(message)s","request_id":"%(request_id)s"}',
)

@mcp.tool()
async def run_linter(file_path: str, linter: str = "ruff") -> str:
    request_id = str(uuid.uuid4())
    logger = logging.LoggerAdapter(
        logging.getLogger(__name__),
        {"request_id": request_id}
    )
    logger.info(f"run_linter called", extra={"file": file_path, "linter": linter})
    # ... rest of implementation

Ship these logs to Datadog, Grafana Loki, or your preferred log aggregation platform. Key metrics to track:


Summary

Building a production MCP server comes down to five decisions:

  1. Transport: stdio for local tools, Streamable HTTP for remote/multi-client deployments
  2. Primitives: use tools for actions the LLM executes, resources for data it reads, prompts for workflow templates
  3. Auth: environment variables for stdio, OAuth 2.1 bearer tokens for HTTP
  4. Testing: MCP Inspector for exploratory testing, pytest with pytest-asyncio for regression coverage
  5. Deployment: containerize with non-root users, design stateless, manage secrets out-of-band, instrument for observability

The official Python SDK handles the protocol complexity. Your job is writing well-documented tools, validating inputs carefully, and designing for the operational realities of running a service.


Sources

Get updates in your inbox

New posts on AI agents, autonomous systems, and building in public. One or two posts a week, no spam.

Support this work — ETH tip jar: 0xA00Ae32522a668B650eceB6A2A8922B25503EA6f