MicroservicesKubernetesDockerDevOps

Cloud-Native Deployment: From Docker to Kubernetes at Scale

2024-12-30 • 12 min read

We've got our FastAPI microservices working great on our laptops. Now we need to deploy to production. Our CTO keeps saying 'use Docker and Kubernetes' but honestly... can't we just run python main.py on a server and call it a day?

You could, but let me tell you what happens next: the server reboots, your app dies. Traffic spikes, one process can't handle it. You need to update the code, but you have to take the app offline. A dependency breaks, you can't roll back. Within a week, you're firefighting 24/7. Production isn't a laptop—it's a hostile environment where everything that can fail, will fail.

Okay, so what does Docker actually give us?

Docker solves the 'works on my machine' problem. Your laptop has Python 3.12, the server has 3.9. Your laptop has certain libraries, the server doesn't. Docker packages your app with all its dependencies into a container—a lightweight, isolated unit that runs identically everywhere. No more 'but it worked on my laptop!' arguments.

So containers are like virtual machines?

Similar concept, different implementation. Virtual machines include an entire OS—they're heavy (gigabytes, minutes to boot). Containers share the host OS kernel—they're lightweight (megabytes, seconds to boot). You can run 100 containers on one server easily. 100 VMs? Not happening.

The "Million Dollar" Question

"But why not just install Python and dependencies on the server directly?"

Technical Reality Check

Why 'Just Install It' Doesn't Scale

Problem 1: Dependency Hell
App A needs NumPy 1.24, App B needs NumPy 1.26. You can't have both system-wide. Containers isolate dependencies.

Problem 2: Reproducibility
You install packages today, they work. Tomorrow, a library updates and breaks everything. Containers pin exact versions.

Problem 3: Security
If your app gets hacked, it can access the entire server. Containers provide isolation—a compromised container can't touch the host.

Problem 4: Portability
Your app runs on Ubuntu 20.04, but the new server is Ubuntu 24.04. Containers run anywhere—AWS, Azure, your laptop, on-premise servers.

Problem 5: Scaling
To scale, you need to manually provision servers, install dependencies, configure load balancers. With containers + Kubernetes, you just say 'I need 10 replicas' and it happens automatically.

Alright, I'm sold on Docker. How do we actually build a production image?

Building Production-Grade Docker Images

Anti-Pattern (what most people do wrong):

FROM python:3.12
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "main.py"]

Problems:

Image is 1GB+ (includes build tools you don't need)
Runs as root (security nightmare)
No health checks
Uses dev server instead of production server

Production Pattern (multi-stage build):

# Stage 1: Builder (has build tools)
FROM python:3.12-slim as builder

WORKDIR /build
RUN apt-get update && apt-get install -y gcc postgresql-client && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: Runtime (minimal, no build tools)
FROM python:3.12-slim

# Security: non-root user
RUN groupadd -g 1000 app && useradd -r -u 1000 -g app app

# Copy dependencies from builder
COPY --from=builder /root/.local /home/app/.local

WORKDIR /app
COPY --chown=app:app . .

ENV PATH=/home/app/.local/bin:$PATH
USER app

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=2)" || exit 1

EXPOSE 8000

# Production server (Gunicorn + Uvicorn)
CMD ["gunicorn", "main:app", \
     "--workers", "4", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8000", \
     "--timeout", "120"]

Results:

Image size: 200MB (vs 1GB+)
Runs as non-root user
Uses production server (Gunicorn)
Health checks for orchestration

Okay, we've got a Docker image. Now what? We still need to deploy it, scale it, handle failures...

That's where Kubernetes comes in. Docker runs containers. Kubernetes orchestrates them—it's the conductor of your container orchestra.

Kubernetes sounds complicated. What does it actually do?

What Kubernetes Gives You

1. Self-Healing
Container crashes? Kubernetes restarts it automatically. Node (server) dies? Kubernetes reschedules containers to healthy nodes.

2. Auto-Scaling
Traffic spikes to 10x normal? Kubernetes spins up more containers automatically. Traffic drops? It scales down to save money.

3. Load Balancing
Multiple container replicas? Kubernetes distributes traffic evenly across them.

4. Rolling Updates
New version to deploy? Kubernetes gradually replaces old containers with new ones—zero downtime.

5. Service Discovery
Service A needs to call Service B? Kubernetes provides internal DNS—no hardcoded IPs.

6. Secret Management
Database passwords, API keys? Kubernetes stores them encrypted, injects them as environment variables.

7. Resource Management
Each container gets defined CPU/memory limits. No single container can starve others.

Okay, but how do we actually deploy to Kubernetes?

Deploying to Kubernetes

Step 1: Define the Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-app
spec:
  replicas: 3  # Run 3 containers for high availability
  selector:
    matchLabels:
      app: fastapi-app
  template:
    metadata:
      labels:
        app: fastapi-app
    spec:
      containers:
      - name: api
        image: myregistry.azurecr.io/fastapi-app:1.0.0
        ports:
        - containerPort: 8000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: database-url
        resources:
          requests:
            memory: "256Mi"  # Reserve 256MB RAM
            cpu: "250m"      # Reserve 0.25 CPU cores
          limits:
            memory: "512Mi"  # Max 512MB RAM
            cpu: "500m"      # Max 0.5 CPU cores
        livenessProbe:  # Is the app running?
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:  # Can the app handle traffic?
          httpGet:
            path: /health/ready
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5

Step 2: Create a Service (Load Balancer)

apiVersion: v1
kind: Service
metadata:
  name: fastapi-service
spec:
  type: ClusterIP
  selector:
    app: fastapi-app
  ports:
  - port: 80
    targetPort: 8000

Step 3: Deploy

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

Kubernetes now:

Pulls your Docker image
Starts 3 container replicas
Distributes them across available nodes
Load balances traffic between them
Monitors health and restarts failures

What about auto-scaling? You said it handles traffic spikes automatically.

Horizontal Pod Autoscaling

Define scaling rules:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: fastapi-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fastapi-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

How it works:

Normal load: 3 replicas running
CPU usage hits 70%: Kubernetes adds more replicas
Scales up to max 20 replicas
Load drops: Scales back down gradually

Real example: E-commerce site during Black Friday. Traffic goes from 1,000 req/s to 50,000 req/s. Kubernetes automatically scales from 3 pods to 50 pods. Cost: $50/hr during peak vs $1,000/hr if you over-provisioned for peak 24/7.

The "Million Dollar" Question

"But doesn't auto-scaling mean we're constantly restarting containers? Won't that cause downtime?"

Technical Reality Check

Zero-Downtime Deployments: Rolling Updates

Kubernetes never kills all containers at once. It uses rolling updates:

Process:

You deploy new version (v2)
Kubernetes starts one new v2 container
Waits for readiness probe to pass
Routes traffic to v2 container
Kills one old v1 container
Repeats until all containers are v2

At any moment:

Old containers handle existing requests
New containers handle new requests
Zero downtime

Rollback if something breaks:

kubectl rollout undo deployment/fastapi-app

Kubernetes reverts to previous version. Your broken deploy is live for maybe 30 seconds before you notice and rollback.

This sounds great, but also complex. What's the learning curve?

The Reality Check

Kubernetes is powerful but has a steep learning curve.

When Kubernetes is overkill:

❌ Small team (<5 developers): Overhead isn't worth it. Use Heroku, Render, or DigitalOcean App Platform.

❌ Simple monolith: One service? Docker Compose on a single server works fine.

❌ Low traffic (<1,000 req/s): Manual scaling is fine. Kubernetes is overkill.

When Kubernetes is essential:

✅ Microservices architecture (10+ services)
✅ High traffic (10,000+ req/s)
✅ Need auto-scaling (unpredictable traffic)
✅ Multi-region deployment (global users)
✅ Zero-downtime requirements (finance, healthcare)

Alternative for startups: Managed platforms like AWS ECS, Google Cloud Run, or Azure Container Apps. They give you container orchestration without the Kubernetes complexity.

We're a team of 15 with microservices. Kubernetes makes sense. But how do we monitor everything?

Observability: The Three Pillars

1. Metrics (Prometheus + Grafana)

Track system health:

Request rate, latency, error rate
CPU, memory, disk usage
Database connection pool size
Queue depth

Deploy Prometheus:

helm install prometheus prometheus-community/kube-prometheus-stack

Access Grafana dashboard at http://localhost:3000. Pre-built dashboards show:

Cluster resource usage
Pod metrics
Application metrics (if you expose /metrics endpoint)

2. Logs (Loki or ELK Stack)

Centralize logs from all containers:

import structlog

logger = structlog.get_logger()

@app.middleware("http")
async def logging_middleware(request: Request, call_next):
    structlog.contextvars.bind_contextvars(
        request_id=request.headers.get("X-Request-ID"),
        path=request.url.path
    )
    
    response = await call_next(request)
    
    logger.info("request_completed",
        status_code=response.status_code,
        duration_ms=round(duration * 1000, 2)
    )
    return response

Logs appear in Grafana Loki. Search by request ID, service name, error type.

3. Traces (Jaeger or Tempo)

Trace requests across microservices:

User request hits API Gateway (5ms)
API Gateway calls Auth service (12ms)
Auth calls Database (8ms)
API Gateway calls Order service (25ms)
Order calls Inventory service (30ms)

Total latency: 80ms

Jaeger shows the entire flow. You see that Inventory is the bottleneck.

Last question: what does a real production architecture look like?

Production Architecture Example

Infrastructure:

Cloudflare (CDN + DDoS protection)
          ↓
Kubernetes Ingress (NGINX)
          ↓
Service Mesh (Istio - mTLS, load balancing, circuit breaking)
          ↓
┌──────────────────────────────────────────┐
│  FastAPI Microservices                   │
│  - Auth Service (3-10 pods)              │
│  - Order Service (5-20 pods)             │
│  - Inventory Service (3-15 pods)         │
│  - Payment Service (2-8 pods)            │
└──────────────────────────────────────────┘
          ↓
┌──────────────────────────────────────────┐
│  Data Layer                              │
│  - PostgreSQL (primary + replicas)       │
│  - Redis (caching, session storage)      │
│  - RabbitMQ (task queue)                 │
│  - Kafka (event streaming)               │
└──────────────────────────────────────────┘
          ↓
┌──────────────────────────────────────────┐
│  Observability                           │
│  - Prometheus (metrics)                  │
│  - Loki (logs)                           │
│  - Jaeger (traces)                       │
│  - Grafana (dashboards)                  │
└──────────────────────────────────────────┘

Real-world performance numbers:

Metric	Value
Pods (normal)	25
Pods (peak)	80
Requests/second	35,000
P50 latency	18ms
P95 latency	65ms
P99 latency	120ms
Uptime	99.95%
Cost (normal)	$800/month
Cost (peak)	$1,200/month

Compare to over-provisioned architecture: $4,000/month flat cost for same peak capacity.

This is making sense. What's the bottom line?

Key Takeaways

1. Docker solves 'works on my machine.'
Containers package your app with all dependencies. Run anywhere.

2. Kubernetes orchestrates containers at scale.
Auto-scaling, self-healing, zero-downtime deployments.

3. Use multi-stage Docker builds.
Smaller images (200MB vs 1GB+), better security, faster deploys.

4. Define resource limits.
CPU/memory requests and limits prevent one service from starving others.

5. Health checks are mandatory.
Liveness (is it running?) and readiness (can it handle traffic?) probes.

6. Auto-scaling saves money.
Pay for what you use. Scale up during peaks, down during valleys.

7. Rolling updates = zero downtime.
Gradual rollout. Easy rollback if something breaks.

8. Observability is non-negotiable.
Metrics, logs, traces. You can't fix what you can't see.

9. Kubernetes has a learning curve.
For small teams or simple apps, managed platforms (Cloud Run, ECS) are easier.

10. Start simple, scale as needed.
Don't over-engineer. Docker Compose → Managed container platform → Kubernetes as you grow.

Next Steps:

What are Microservices?: Understand the fundamentals of microservices architecture
FastAPI Async Patterns: Build high-performance services to deploy
gRPC & Messaging: Choose the right communication protocol for your microservices

← PreviousgRPC & Messaging Patterns: Choosing the Right Communication Protocol

← Back to Home