MicroservicesFastAPIAsyncPython

FastAPI Async Patterns: Why Async-First Matters

2024-12-30 • 10 min read

We built a nice Flask API for our microservices. Works great! But now we're getting 500-1000 concurrent users and the response times are crawling. We tried adding more servers but costs are exploding. Someone said 'use FastAPI with async' but... isn't Python already fast enough?

Here's what's happening: Flask and Django use WSGI (Web Server Gateway Interface), which is synchronous. Every request gets its own thread or process. When you make a database query or call another API, that thread blocks—it sits there waiting, doing absolutely nothing, burning memory. With 1000 concurrent users each waiting 200ms for database queries, you need 1000 threads. That's insane resource consumption.

Okay, but we can just spawn more threads, right?

In theory, yes. In practice, threads are expensive. Each thread consumes 1-8MB of RAM just for the stack. Plus, the OS spends CPU cycles context-switching between threads. Beyond ~500 threads, you're spending more time switching than doing actual work. It's the C10K problem—how to handle 10,000 concurrent connections without melting your server. The answer? Async I/O with an event loop.

Event loop? That sounds... complicated.

Think of it like a restaurant kitchen. Synchronous (Flask) is one chef making one dish at a time—chop veggies, wait for water to boil (blocking!), wait for oven (blocking!), plate, next order. Asynchronous (FastAPI) is a chef who starts water boiling, puts something in the oven, and while those cook, chops veggies for the next order. The chef never stands idle waiting. That's the event loop—it juggles multiple tasks, switching whenever one is waiting on I/O.

So FastAPI doesn't create threads for each request?

Exactly. FastAPI uses ASGI (Async Server Gateway Interface) built on Starlette. It runs a single-threaded event loop (per worker process). When you await a database query, the loop says 'cool, I'll come back to you when the DB responds' and immediately handles the next request. One thread can juggle thousands of concurrent connections because it's never blocked—it's always doing useful work.

The "Million Dollar" Question

"But doesn't that mean if one request is slow, it blocks everything?"

Technical Reality Check

The Secret Sauce: Non-Blocking I/O

The magic is that I/O operations don't block the event loop. When you do:

result = await database.fetch_one(query)

The database driver registers a callback with the OS kernel: 'tell me when data is ready.' The event loop moves on to other requests. When the kernel says 'data ready!', the loop resumes that coroutine. All of this happens in microseconds—the overhead is negligible compared to the 10-200ms you spend waiting for I/O.

Performance numbers (real FastAPI vs Flask benchmark, 1000 concurrent users):

Metric	Flask (sync)	FastAPI (async)
Throughput	1,200 req/s	8,500 req/s
P95 Latency	850ms	120ms
Memory Usage	2.4GB	450MB
CPU Usage	85%	40%

That's a 7x throughput increase with 5x less memory.

Okay, I'm convinced. How do we actually write async code in FastAPI?

Essential Async Patterns

Pattern 1: Async Database Access

First rule: use async database drivers. Your sync psycopg2 or pymysql will BLOCK the event loop and destroy performance.

from fastapi import FastAPI
from databases import Database

app = FastAPI()
database = Database("postgresql://user:pass@localhost/db")

@app.on_event("startup")
async def startup():
    await database.connect()

@app.on_event("shutdown")
async def shutdown():
    await database.disconnect()

@app.get("/users/{user_id}")
async def get_user(user_id: int):
    query = "SELECT * FROM users WHERE id = :id"
    result = await database.fetch_one(query, {"id": user_id})
    return result

Key libraries:

asyncpg for PostgreSQL (fastest async driver, 3x faster than psycopg2)
aiomysql for MySQL
motor for MongoDB
databases for multi-DB abstraction

Connection pooling is critical—creating DB connections is slow (50-100ms). Reuse them.

Wait, what if we need to call multiple APIs or databases in one request? Do we await each one sequentially?

No! That's the rookie mistake. If you await sequentially, you're back to synchronous behavior. Instead, use asyncio.gather() to run them in parallel:

Pattern 2: Parallel API Calls

import httpx
import asyncio
from fastapi import FastAPI

app = FastAPI()

@app.get("/aggregated-data/{user_id}")
async def get_aggregated_data(user_id: int):
    async with httpx.AsyncClient() as client:
        # Run API calls in parallel
        user_task = client.get(f"https://api.example.com/users/{user_id}")
        orders_task = client.get(f"https://api.example.com/orders?user={user_id}")
        reviews_task = client.get(f"https://api.example.com/reviews?user={user_id}")
        
        # Await all results simultaneously
        user_resp, orders_resp, reviews_resp = await asyncio.gather(
            user_task, orders_task, reviews_task
        )
        
    return {
        "user": user_resp.json(),
        "orders": orders_resp.json(),
        "reviews": reviews_resp.json()
    }

Performance impact:

Sequential: 300ms + 200ms + 150ms = 650ms total
Parallel: max(300, 200, 150) = 300ms total

You just cut latency by more than half!

That's incredible. But what if some tasks are slow? Like sending emails or processing uploads?

Use Background Tasks. You don't want your user waiting 5 seconds for an email to send:

Pattern 3: Background Tasks

from fastapi import BackgroundTasks
import smtplib

def send_email(email: str, message: str):
    # Simulate slow email sending
    time.sleep(5)  # This is OK because it runs in background thread pool
    # ... SMTP logic

@app.post("/signup")
async def signup(email: str, background_tasks: BackgroundTasks):
    # Save user to DB (fast)
    await database.execute("INSERT INTO users...")
    
    # Send welcome email in background (slow, but user doesn't wait)
    background_tasks.add_task(send_email, email, "Welcome!")
    
    # Return immediately
    return {"message": "User created"}

Use cases: Email sending, log processing, cache invalidation, webhook notifications, analytics tracking.

The "Million Dollar" Question

"But wait—you said FastAPI is single-threaded. How does a background task not block the event loop?"

Technical Reality Check

The Truth About CPU-Bound vs I/O-Bound Tasks

FastAPI's background tasks run in a thread pool executor. Here's the nuance:

I/O-bound tasks (network, disk, DB): Use async/await. Event loop handles them efficiently.

CPU-bound tasks (image processing, ML inference, heavy computation): These WILL block the event loop. Solutions:

Offload to thread pool:

import asyncio

@app.get("/compute")
async def heavy_computation():
    result = await asyncio.to_thread(some_sync_cpu_task)
    return result

Use a task queue (Celery, RQ): For truly heavy work, send it to a worker process.
Use multiprocessing: Spawn separate processes for CPU work.

Rule of thumb: If it waits on I/O → async/await. If it burns CPU → offload.

This is making sense. But what about production deployment? Can we actually run this at scale?

Production Deployment

Multi-Worker Setup with Gunicorn + Uvicorn:

# Development (single worker)
uvicorn main:app --host 0.0.0.0 --port 8000

# Production (multiple workers)
gunicorn main:app \
  --workers 4 \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 120 \
  --access-logfile - \
  --error-logfile -

Rule of thumb: workers = (2 × CPU cores) + 1

For an 8-core machine: 17 workers. Each worker runs its own async event loop. This gives you multi-core parallelism (multiple processes) PLUS async concurrency (event loop per process). Best of both worlds.

What about common pitfalls? What mistakes do people make?

Common Async Pitfalls (and How to Avoid Them)

Pitfall 1: Mixing Sync and Async

# ❌ BAD: Blocks event loop
@app.get("/users")
async def get_users():
    result = some_sync_db_call()  # DISASTER! Blocks entire server!
    return result

# ✅ GOOD: Use async library
@app.get("/users")
async def get_users():
    result = await some_async_db_call()
    return result

# ✅ ALSO GOOD: Offload to thread pool
import asyncio

@app.get("/users")
async def get_users():
    result = await asyncio.to_thread(some_sync_db_call)
    return result

Pitfall 2: Not Using Connection Pools

Creating database connections is expensive (50-100ms). Always use pools:

# ❌ BAD: New connection per request
async def get_user(user_id: int):
    db = await asyncpg.connect("postgresql://...")
    result = await db.fetchrow("SELECT ...")
    await db.close()
    return result

# ✅ GOOD: Connection pool
pool = await asyncpg.create_pool("postgresql://...", min_size=10, max_size=20)

async def get_user(user_id: int):
    async with pool.acquire() as conn:
        result = await conn.fetchrow("SELECT ...")
    return result

Pitfall 3: Forgetting Timeouts

# ❌ BAD: Can hang forever
response = await client.get("https://slow-api.com")

# ✅ GOOD: Set timeout
async with httpx.AsyncClient(timeout=5.0) as client:
    response = await client.get("https://slow-api.com")

Without timeouts, one slow upstream service can cascade and kill your entire system.

So when should we NOT use async?

Technical Reality Check

When Async Isn't Worth It

Skip async if:

Simple CRUD app with <100 concurrent users: The complexity isn't worth it. Flask works fine.
CPU-intensive workload: If you're doing image processing or ML inference, async won't help—you're CPU-bound, not I/O-bound. Use multiprocessing instead.
Legacy sync libraries: If your entire stack is sync (old Django ORM, sync Redis client), forcing async adds complexity without benefits. Either stay sync or migrate fully.
Team lacks async experience: Async bugs are subtle (deadlocks, race conditions). If your team doesn't understand event loops, you'll waste time debugging.

Use async when:

✅ High concurrency (500+ concurrent users)
✅ I/O-heavy workloads (database queries, API calls)
✅ Microservices calling other microservices
✅ Real-time features (WebSockets, SSE streaming)
✅ Cost-conscious (serve 10x more traffic on same hardware)

This is a lot. What's the takeaway?

Key Takeaways

1. Async is about I/O concurrency, not speed.
One async request isn't faster than one sync request. But handling 10,000 async requests simultaneously is vastly more efficient than 10,000 sync threads.

2. Use async DB drivers and HTTP clients.
Sync libraries will sabotage your event loop. asyncpg, motor, httpx are your friends.

3. Parallelize with asyncio.gather().
Don't await sequentially when you can run in parallel.

4. Connection pooling is mandatory.
Creating connections is slow. Reuse them.

5. Always set timeouts.
One slow upstream can cascade and kill your system.

6. Deploy with Gunicorn + Uvicorn workers.
Multiple workers = multi-core CPU usage. Each worker = async concurrency.

Real-world win: E-commerce API with FastAPI served 12,000 requests/second with P95 latency of 45ms on just 4 CPU cores and 2GB RAM. Flask equivalent needed 8 cores and 4GB for 4,000 req/s at 150ms latency.

Next Steps:

gRPC & Messaging Patterns: Learn when to use REST vs gRPC vs message queues for inter-service communication
Cloud-Native Deployment: Deploy FastAPI on Kubernetes with auto-scaling and service mesh

← PreviousWhat are High-Performance Microservices? A Simple Guide Next →gRPC & Messaging Patterns: Choosing the Right Communication Protocol

← Back to Home