🌐 HTTP Fundamentals

When you type a URL and hit Enter, this is what happens. Click any step to learn more.

Loading diagram...
What does an HTTP message actually look like?

HTTP is just text. Your browser sends:

GET /page HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 ...
Accept: text/html

The server responds:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1234

<html>
  <body>Hello, world!</body>
</html>
HTTP Methods (GET, POST, PUT, DELETE)
  • GET — Retrieve data (the default when you visit a URL)
  • POST — Submit data (forms, API calls)
  • PUT — Replace a resource
  • DELETE — Remove a resource
  • HEAD — Like GET, but only return headers
Status Codes (200, 404, 502...)
  • 2xx — Success (200 OK, 201 Created)
  • 3xx — Redirect (301 Moved, 302 Found)
  • 4xx — Client error (400 Bad Request, 404 Not Found)
  • 5xx — Server error (500 Internal, 502 Bad Gateway)
502 Bad Gateway = the reverse proxy couldn't reach your app. Usually means Flask isn't running.
Try It: Send a real HTTP request
Request
Click "Inspect" to see the HTTP request
Response
Response will appear here
What does "HTTP is stateless" mean?
You see a 502 error. What's most likely wrong?
Loading...

🔍 DNS Resolution

Before your browser can connect to a server, it needs an IP address. Domain names like example.com are for humans — computers route packets using numerical addresses like 93.184.216.34. The Domain Name System (DNS) translates between the two.

The lookup chain

When you visit example.com, the resolution goes through several layers of cache before hitting the network:

1. Browser cache       — "Did I look this up recently?"
2. OS cache            — "Has any app on this machine looked it up?"
3. Router cache        — "Has anyone on this network looked it up?"
4. ISP's DNS resolver  — "Has any ISP customer looked it up?"
5. Recursive query     — Walk the DNS tree: root → .com → example.com
This is why DNS changes are slow. Each layer caches the answer for the duration of the TTL (Time To Live). If you change your DNS records, old cached answers may persist for minutes to hours until the TTL expires at each layer.

Record types

A Maps domain to IPv4 address example.com → 93.184.216.34
AAAA Maps domain to IPv6 address example.com → 2606:2800:220:1:...
CNAME Alias — points to another domain www.example.com → example.com
MX Mail server for the domain example.com → mail.example.com
TXT Arbitrary text (SPF, verification) "v=spf1 include:_spf.google.com ~all"
Debugging DNS: dig, nslookup, and /etc/hosts

When DNS isn't behaving, these tools help:

# Query DNS directly (bypasses all caches)
$ dig example.com +short
93.184.216.34

# See the full resolution chain
$ dig example.com +trace

# Quick lookup
$ nslookup example.com
Server:  127.0.0.53
Address: 93.184.216.34

You can also override DNS locally with /etc/hosts:

# /etc/hosts — local overrides, checked before DNS
127.0.0.1    myapp.local
10.0.1.5     staging.myapp.com

This is useful for testing a new server before switching DNS publicly.

"My site still points to the old server!" You updated your A record 10 minutes ago. But your ISP's DNS resolver cached the old answer with a 1-hour TTL. Nothing you can do except wait — or lower the TTL before the migration so it expires faster.
What is the primary purpose of DNS?
You changed your DNS A record but the old IP still shows up. What's the most likely cause?
What type of DNS record creates an alias from one domain to another?
Cloudflare DNS: Proxy vs DNS-only

When you use Cloudflare as your DNS provider, each record gets a toggle: Proxied (orange cloud) or DNS-only (gray cloud). This single toggle changes everything about how traffic reaches your server.

Setting What dig shows What happens
☁️ Proxied 104.21.x.x (Cloudflare edge IP) Traffic routes through Cloudflare — gets WAF, DDoS protection, caching, analytics
☁️ DNS-only 203.0.113.50 (your origin IP) Cloudflare is just a nameserver — traffic goes directly to your server
Origin IP leaks. Even if your main A record is proxied, other records like MX (mail) are always DNS-only and expose your real server IP. Attackers use this to find origins and bypass Cloudflare. Check with: dig MX yourdomain.com +short. Ideally, use a separate IP or service for mail.

CNAME Flattening

The DNS spec says you cannot put a CNAME at the zone apex (the bare domain like example.com). Why? Because CNAME means "this name is an alias for that name" — but the apex must also have SOA and NS records, and CNAME can't coexist with other record types.

Cloudflare solves this with CNAME flattening: you create a CNAME at the root, but Cloudflare resolves it server-side and returns an A record to the querying resolver. The client never sees the CNAME — it just gets an IP address.

Try it: dig +short example.com — if a site uses Cloudflare with a proxied root, you'll see Cloudflare edge IPs instead of the origin. Compare with dig +short mail.example.com which might reveal the real server.
You run dig +short yoursite.com and see 104.21.48.200 — a Cloudflare IP, not your server's IP. What does this tell you?
Why can't you create a standard CNAME record at example.com (the zone apex)?

🔌 TCP & Sockets

HTTP rides on top of TCP (Transmission Control Protocol). While HTTP defines the message format, TCP handles the actual delivery — ensuring bytes arrive in order, retransmitting lost packets, and managing connections.

A socket is the programming interface to TCP. When your Flask app listens on port 5000, it's creating a socket that waits for incoming TCP connections.

What is a port? A port is just a 16-bit number (0-65535) that identifies which application should receive incoming data. It's not a physical thing — it's like an apartment number in a building (the IP address).

When you run flask run --port 5000, here's what happens:

# Your app does (simplified):
import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('0.0.0.0', 5000))  # Claim port 5000
sock.listen()                  # Start accepting connections

while True:
    client, addr = sock.accept()  # Wait for a connection
    data = client.recv(1024)       # Read the HTTP request
    client.send(b'HTTP/1.1 200 OK\r\n\r\nHello')
    client.close()

The bind() call claims the port. If another process already has it, you get the dreaded error:

OSError: [Errno 98] Address already in use

This usually means:

  • Another instance of your app is running
  • The previous instance crashed but the OS hasn't released the port yet
  • Some other service is using that port
Common Ports
80 HTTP (unencrypted)
443 HTTPS (encrypted)
22 SSH
5432 PostgreSQL
5000-9999 Common range for development servers

Connection Lifecycle

TCP connections go through a handshake before data can flow:

Client                    Server
   |                         |
   |-------- SYN ----------->|  "I want to connect"
   |<------ SYN-ACK ---------|  "OK, I acknowledge"
   |-------- ACK ----------->|  "Great, let's go"
   |                         |
   |====== DATA FLOWS =======|
   |                         |
   |-------- FIN ----------->|  "I'm done"
   |<------ FIN-ACK ---------|  "OK, me too"
Keep-alive connections: HTTP/1.1 introduced persistent connections. Instead of closing after each request, the connection stays open for multiple requests. This avoids the overhead of repeated handshakes.
What is a port?
"Address already in use" usually means:

🛡️ Reverse Proxies

A reverse proxy sits between the internet and your application. Every production web app uses one — Apache, nginx, or a cloud load balancer. It's the front door that decides where each request goes.

What does it do?

  • TLS termination — handles HTTPS encryption so your app speaks plain HTTP internally
  • Request routing — sends /api to Flask, /images to the filesystem
  • Static file serving — serves CSS, JS, images directly without hitting your Python app
  • Load balancing — distributes requests across multiple app servers
  • Connection buffering — absorbs slow clients so your app workers stay free

A real Apache config

This is a simplified version of what a production Apache config looks like:

<VirtualHost *:443>
    ServerName myapp.example.com

    # TLS termination
    SSLEngine on
    SSLCertificateFile /etc/letsencrypt/live/myapp/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/myapp/privkey.pem

    # API requests → Flask app on port 9912
    ProxyPass /api http://127.0.0.1:9912/api
    ProxyPassReverse /api http://127.0.0.1:9912/api

    # Static files → filesystem (never hits Python)
    Alias /static /home/app_mysite/frontend/static
    <Directory /home/app_mysite/frontend/static>
        Require all granted
    </Directory>

    # Everything else → Flask
    ProxyPass / http://127.0.0.1:9912/
    ProxyPassReverse / http://127.0.0.1:9912/
</VirtualHost>
Order matters! Alias directives are checked before ProxyPass. If you put the catch-all ProxyPass / first, it will match everything and your Alias for static files will never be reached.

Forwarded headers

Your app sits behind the proxy, so it sees 127.0.0.1 as the client IP — not the real user. The proxy adds headers to pass along the original information:

X-Forwarded-For: 203.0.113.50      # Real client IP
X-Forwarded-Proto: https            # Original protocol
X-Forwarded-Host: myapp.example.com # Original hostname

Flask's ProxyFix middleware reads these headers so request.remote_addr and request.url reflect reality:

from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)
502 Bad Gateway debugging checklist

When you see 502, work through this list:

# 1. Is the app process running?
$ sudo systemctl status myapp
● myapp.service - My Web App
   Active: inactive (dead)     ← Not running!

# 2. Check the app logs for crash reason
$ sudo journalctl -u myapp -n 50

# 3. Is it listening on the right port?
$ ss -tlnp | grep 9912
LISTEN  0  128  127.0.0.1:9912  *:*  users:(("gunicorn",pid=1234))

# 4. Does the proxy config point to the right port?
$ grep ProxyPass /etc/apache2/sites-enabled/*.conf
What does "TLS termination" at the reverse proxy mean?
What does a 502 Bad Gateway error indicate?
Your Flask app logs show every request coming from 127.0.0.1 instead of real user IPs. What's missing?
Cloudflare: a reverse proxy in the cloud

Cloudflare is itself a reverse proxy — it sits between users and your server, just like Apache sits between the internet and Flask. With Cloudflare proxied DNS, the full chain looks like this:

User's Browser
    ↓ HTTPS
Cloudflare Edge (nearest PoP)
    ↓ HTTPS (or HTTP, depending on SSL mode)
Your Server: Apache
    ↓ HTTP (localhost)
Gunicorn / Flask

This means you have a double-proxy chain, and each proxy adds its own forwarded headers:

Header Set by Contains
CF-Connecting-IP Cloudflare The real user's IP address (most reliable)
X-Forwarded-For Both proxies Chain: user-ip, cloudflare-ip
CF-RAY Cloudflare Unique request ID + datacenter code (e.g., 7a1b2c3d-IAD)
CF-IPCountry Cloudflare Two-letter country code of the user (e.g., US)
Double-proxy X-Forwarded-For trap. With ProxyFix(app.wsgi_app, x_for=1), Flask reads the last IP in X-Forwarded-For — which is Cloudflare's IP, not the user's. You need x_for=2 to skip past Cloudflare, or better yet, read CF-Connecting-IP directly via request.headers.get('CF-Connecting-IP').
CF-RAY for debugging. When a user reports "your site showed an error," ask them for the CF-RAY ID from their response headers. You can search your Cloudflare dashboard by Ray ID to find the exact request, its status code, and whether it was served from cache or hit your origin.
Your Flask app is behind both Cloudflare and Apache. With ProxyFix(x_for=1), request.remote_addr shows a Cloudflare IP instead of the user's real IP. Why?

⚙️ WSGI & Gunicorn

WSGI (Web Server Gateway Interface) is the standard that connects Python web frameworks to web servers. It's an interface — a contract that says "give me an environ dict and a callback, and I'll give you a response."

Every Python web framework (Flask, Django, FastAPI via ASGI) implements this interface. Every production Python web server (Gunicorn, uWSGI) knows how to call it.

The WSGI interface

At its core, WSGI is just a function with a specific signature:

def application(environ, start_response):
    """
    environ: dict with HTTP_HOST, REQUEST_METHOD, PATH_INFO, etc.
    start_response: callback to set status and headers
    """
    status = '200 OK'
    headers = [('Content-Type', 'text/html')]
    start_response(status, headers)
    return [b'<h1>Hello, World!</h1>']

Flask wraps this — when you write @app.route decorators, Flask builds the application callable for you. But under the hood, every request goes through this interface.

Why Gunicorn?

Never use flask run in production. Flask's development server handles one request at a time, has no process management, and wasn't built for reliability. It exists for development only.

Gunicorn is a production WSGI server. It pre-forks multiple worker processes, each capable of handling requests independently:

# Development (single process, auto-reload)
$ flask run --port 5000

# Production (4 worker processes, managed)
$ gunicorn --workers 4 --bind 127.0.0.1:9912 main:app

The key Gunicorn options:

  • --workers N — number of worker processes (typically 2 * CPU_cores + 1)
  • --bind HOST:PORT — address to listen on (use 127.0.0.1 behind a proxy)
  • --timeout 30 — kill workers that take longer than this
  • --worker-class sync — synchronous workers (default, simplest)
Sync vs async workers

Sync workers (default) handle one request at a time per worker. Simple and predictable. Good for CPU-bound work or apps with fast responses.

Async workers (gevent, eventlet) use green threads to handle many requests per worker concurrently. Better for I/O-bound work (waiting on databases, external APIs).

# Sync: 4 workers = 4 concurrent requests max
$ gunicorn --workers 4 main:app

# Async: 4 workers × 1000 connections each
$ gunicorn --workers 4 --worker-class gevent \
    --worker-connections 1000 main:app

Start with sync workers. Switch to async only when you measure a bottleneck.

What is WSGI?
Why shouldn't you use flask run in production?

🔄 Flask Request Lifecycle

When a request arrives at Flask, it goes through a specific sequence of steps. Understanding this lifecycle helps you know where to put authentication checks, logging, database connections, and error handling.

The lifecycle

Request arrives
    ↓
1. URL routing — match the path to a view function
    ↓
2. @before_request hooks — run before every request
    ↓
3. View function — your code runs
    ↓
4. @after_request hooks — modify the response
    ↓
5. Response sent back

Request context

Inside a request, Flask provides thread-local objects that are available anywhere in your code:

from flask import request, g, session

@app.before_request
def load_user():
    # g is a per-request namespace — dies after the response
    g.user = get_user_from_token(request.headers.get('Authorization'))

@app.route('/profile')
def profile():
    # request — the incoming HTTP request
    page = request.args.get('page', 1)

    # session — encrypted cookie data that persists across requests
    session['last_page'] = '/profile'

    # g.user — set in before_request
    return render_template('profile.html', user=g.user)
The g object is your request-scoped scratch pad. Put database connections, parsed auth tokens, or computed values here. It's created fresh for each request and thrown away after the response — never use it to store data between requests.

Error handlers

Flask lets you register custom error pages:

@app.errorhandler(404)
def not_found(e):
    return render_template('404.html'), 404

@app.errorhandler(500)
def server_error(e):
    # Log the error, notify your team
    app.logger.error(f'Server error: {e}')
    return render_template('500.html'), 500
Common gotcha: @after_request vs @teardown_request

@after_request runs after a successful response and receives the response object. Use it to add headers, CORS, or modify the response.

@teardown_request runs always, even if an exception occurred. Use it for cleanup like closing database connections.

@app.after_request
def add_security_headers(response):
    response.headers['X-Frame-Options'] = 'SAMEORIGIN'
    return response  # Must return the response!

@app.teardown_request
def close_db(exception):
    db = g.pop('db', None)
    if db is not None:
        db.close()
What is Flask's g object used for?
Where should you put an authentication check that runs on every request?

🏭 Processes & Workers

A web server needs to handle multiple requests at the same time. If one user's request takes 2 seconds (waiting on a database), you can't make everyone else wait. The solution: multiple worker processes.

The fork() model

Gunicorn uses the pre-fork model. A master process starts, then creates (forks) worker processes. Each worker is a complete copy of your application:

Master process (PID 1000)
  ├── Worker 1 (PID 1001) — handling request from User A
  ├── Worker 2 (PID 1002) — handling request from User B
  ├── Worker 3 (PID 1003) — idle, waiting
  └── Worker 4 (PID 1004) — handling request from User C

The master doesn't handle requests — it manages workers. If a worker crashes, the master spawns a replacement. If a worker takes too long, the master kills it.

How many workers?

The rule of thumb: workers = 2 * CPU_cores + 1. On a 2-core machine, that's 5 workers. This accounts for time spent waiting on I/O (database, filesystem) — while one worker waits, another can use the CPU.

Processes vs threads

Python has the GIL (Global Interpreter Lock) — only one thread can execute Python code at a time per process. This means threads don't help with CPU-bound work, but they do help with I/O-bound work (waiting on network/database).

Processes Threads
Memory Separate (each worker = full copy) Shared (lighter weight)
GIL impact No impact (each has its own GIL) Limits CPU parallelism
Crash isolation One crash doesn't affect others One crash kills the whole process
Best for CPU-bound, reliability I/O-bound, memory efficiency
Connection pooling basics

Opening a database connection is expensive (~50ms for PostgreSQL). If every request opens a new connection, that's 50ms of overhead before any work starts.

Connection pooling maintains a set of pre-opened connections. Workers borrow a connection, use it, and return it:

# Without pooling: 50ms overhead per request
conn = psycopg2.connect(...)  # Expensive!
cursor = conn.cursor()
cursor.execute('SELECT ...')
conn.close()

# With pooling: connections are reused
from sqlalchemy import create_engine
engine = create_engine('postgresql://...', pool_size=5)
# Connections are borrowed from pool and returned automatically

With 4 workers and a pool of 5 connections each, you have 20 database connections. Make sure your database allows at least that many (max_connections in PostgreSQL).

Why does Gunicorn use multiple processes instead of just multiple threads?
How many Gunicorn workers should you run on a 2-core machine?
In Gunicorn's pre-fork model, what does the master process do?

🔄 Concurrent Jobs & Live Streaming

Some work does not fit inside a normal request/response cycle. Video encoding, AI generation, large imports, and batch reports can take 10-120 seconds. The right model is: accept request quickly, run work in background, stream progress events.

Reference architecture

This pattern lets many users run jobs in parallel without blocking request handlers:

Parallel Job Pipeline
User A Browser User B Browser API Server POST /api/jobs Job Queue queued metadata Worker 1 Worker 2 Worker N Event Log seq + status + text Workers run jobs in parallel Clients stream via GET /api/jobs/:id/stream
Critical rule: never keep a request open for the whole job. Return 202 Accepted with a job_id quickly, then stream progress on a separate endpoint.

Reconnect behavior

When the connection drops, the browser reconnects and sends Last-Event-ID so the server can replay missed events.

SSE Resume Sequence
1) Open stream GET /stream 2) Network drop stream interrupted 3) Auto reconnect Last-Event-ID: 42 4) Replay missed events 43+ 5) UI catches up no lost logs Use monotonically increasing event IDs for reliable replay.

Backend lifecycle (pseudocode)

Keep request handling short. Push real work onto a worker pool or queue:

# POST /api/jobs
user = require_auth()
payload = validate(request.body)
job_id = create_job(user_id=user.id, status="queued")
emit(job_id, kind="queued", percent=0)
queue.push(job_id)
return 202 { job_id, status: "queued" }

# Worker loop (N workers running in parallel)
while true:
    job_id = queue.pop()
    mark_running(job_id)
    for step in build_execution_plan(job_id):
        run(step)
        emit(job_id, kind="progress", percent=step.percent)
    mark_done(job_id)
    emit(job_id, kind="done", percent=100)

SSE stream endpoint (pseudocode)

Server-Sent Events is ideal for one-way progress streams (server → browser). Every event gets an ID so reconnect can resume from Last-Event-ID.

# GET /api/jobs/:id/stream
user = require_auth()
job = load_job(job_id)
if not job or job.user_id != user.id:
    return 404

cursor = request.headers["Last-Event-ID"] or -1
stream "retry: 1500"

while job_not_finished(job_id) OR unseen_events_exist(job_id, cursor):
    events = load_events_after(job_id, cursor)
    for event in events:
        stream id/event/data(event)
        cursor = event.seq
    sleep(250ms)

Frontend stream handling (pseudocode)

job = POST /api/jobs(payload)
stream = EventSource("/api/jobs/{job.id}/stream")

on progress(event):
    render_progress(event.percent, event.message)

on done(event):
    render_done(job.id)
    stream.close()

on error(event):
    keep_progress_ui_visible()
    # SSE retries automatically

Concurrency controls for many users

  • Per-user authorization: every stream/read endpoint must verify job.user_id == current_user.
  • Backpressure: cap queue size and return 429 or 503 when overloaded.
  • Rate limits: enforce max active jobs per user to prevent abuse.
  • Persistence: store job state/events in Redis or DB so worker restarts do not lose progress.
  • Cleanup: expire old jobs/events to avoid unbounded memory growth.
Node event loop vs Gunicorn workers (which is better?)

They are different trade-offs, not winner/loser architectures:

Node.js (event loop) Gunicorn (pre-fork workers)
Default unit Single process, single JS thread Multiple OS processes
I/O concurrency Excellent via non-blocking async events Handled by multiple workers/threads
CPU-heavy tasks Blocks event loop unless offloaded Parallelized across worker processes
Memory sharing Easy inside one process only No shared heap between workers
Scale across cores Usually multiple processes (cluster/containers) Already process-based

Important: once you scale either stack across processes, in-memory state is no longer global. Put shared job state in Redis/DB/message queue, not in process memory.

For long-running jobs with streamed output, both stacks should use the same design: POST /jobs returns quickly, workers do the heavy work, and SSE/WebSocket streams progress from a shared event store.

SSE vs WebSockets vs Polling
  • SSE: simplest for server-to-client progress logs, auto-reconnect built in.
  • WebSocket: use when client must send frequent live control messages (pause/resume/live chat).
  • Polling: easiest to run anywhere but higher latency and repeated overhead.

For "start job + stream logs" flows, SSE is usually the fastest path to production.

Important: they are not mutually exclusive at the system level. You can use SSE for primary streaming and keep polling as a fallback path.

Pattern: SSE primary + polling fallback

For long-running jobs, a robust design uses both:

1) Client starts job: POST /api/jobs -> 202 + job_id
2) Worker runs job and appends events: {seq, kind, message, pct}
3) Client opens SSE: GET /api/jobs/:id/stream
4) On disconnect, SSE reconnects with Last-Event-ID
5) If SSE fails repeatedly, client polls:
   GET /api/jobs/:id/events?after=last_seq

Backend endpoints:

  • POST /api/jobs returns quickly with job_id.
  • GET /api/jobs/:id/stream streams live events (SSE).
  • GET /api/jobs/:id/events?after=<seq> returns missed events for polling fallback.

Client behavior pseudocode:

open_sse(job_id)
on_event(evt): render(evt); cursor = evt.seq
on_sse_error():
    if reconnecting_too_long:
        every 1s:
            events = GET /api/jobs/:id/events?after=cursor
            render(events)
            cursor = max_seq(events)
        keep_retrying_sse_in_background()
Production checklist for job streaming
  • Job create endpoint returns in < 200ms with job_id.
  • Worker pool size and queue depth are explicit config values.
  • Each streamed event has a monotonically increasing event ID.
  • Streams support resume using Last-Event-ID.
  • A polling fallback endpoint exists for networks/proxies that break SSE.
  • Proxy buffering is disabled for stream endpoints.
  • Users can cancel jobs (POST /api/jobs/:id/cancel).
  • Metrics exist: queue wait time, run time, failure rate, active streams.
Why should POST /api/jobs return quickly with 202 Accepted instead of waiting for completion?
What is the most important check on GET /api/jobs/:id/stream in a multi-user app?
A client disconnects mid-job and reconnects. How do you avoid losing log lines?

⚖️ Service Boundaries: Split or Merge?

Splitting services too early creates operational overhead. Splitting too late creates scaling and ownership bottlenecks. Use concrete signals instead of instincts.

1. Do different teams own different parts of this system?

2. Do parts need independent scaling?

3. Do change rates differ across domains?

4. Could one domain be useful without the other?

5. Are data models heavily shared?

6. Do you need separate release cadences?

Keep Together Split

Verdict

Answer all questions to get a recommendation.

Common service-boundary patterns

Pattern 1: Shared SDK over multiple services

Keep services separate internally, but expose one clean client API externally. This preserves independent scaling/deployment without forcing frontend complexity.

// One client, multiple services underneath
Platform.init({
  auth: { appId: 'full-stack-courses' },
  feedback: { projectName: 'fullstack' }
});

Pattern 2: Backend-for-Frontend (BFF)

Frontend makes one call; backend orchestrates multiple services and returns one response shape.

POST /api/submit-feedback
1) Validate user identity
2) Call dependent services
3) Return unified response

Pattern 3: Strangler migration

Start merged, then extract boundaries only where pain is real (team bottlenecks, scaling asymmetry, or dependency blast radius).

The best architecture is the one you can change. Start simpler than your architecture diagram suggests. Split where you have evidence, not anxiety.
Which combination is the strongest signal to split a service boundary?
What does the strangler pattern recommend?

Caching

Caching stores the result of an expensive operation so you can skip the work next time. It happens at every layer of the stack — from the browser to the database. Understanding where caches live (and how to bust them) is essential for debugging "why aren't my changes showing up?"

The caching layers

Browser cache         ← closest to user, fastest
    ↓
CDN cache             ← edge servers around the world
    ↓
Reverse proxy cache   ← at your server's front door
    ↓
Application cache     ← Redis, in-memory dicts
    ↓
Database cache        ← query cache, buffer pool

Browser cache (Cache-Control)

The server tells the browser how long to cache a response using the Cache-Control header:

# "Cache this for 1 hour"
Cache-Control: max-age=3600

# "Cache, but check with server before reusing"
Cache-Control: no-cache

# "Never cache this"
Cache-Control: no-store

# "Cache for 1 year — this URL is versioned"
Cache-Control: public, max-age=31536000, immutable
"My CSS changes aren't showing up!" If your static files have long cache lifetimes, browsers will keep serving the old version. Solutions: add a version query param (styles.css?v=2), use content hashing in filenames (styles.a1b2c3.css), or use Cache-Control: no-cache during development.

Application cache (Redis)

For expensive computations or frequently-accessed data, store results in an in-memory cache:

import redis
cache = redis.Redis()

def get_user_profile(user_id):
    # Check cache first
    cached = cache.get(f'profile:{user_id}')
    if cached:
        return json.loads(cached)

    # Expensive database query
    profile = db.query('SELECT * FROM users WHERE id = ?', user_id)

    # Store in cache for 5 minutes
    cache.setex(f'profile:{user_id}', 300, json.dumps(profile))
    return profile

Cache invalidation

"There are only two hard things in computer science: cache invalidation and naming things." — Phil Karlton

The hard part of caching isn't adding it — it's knowing when to throw away stale data. Common strategies:

  • TTL (Time To Live) — cache expires after N seconds. Simple but may serve stale data.
  • Write-through — update the cache whenever the data changes. Consistent but complex.
  • Cache-aside — only cache on read. Delete from cache on write, refill on next read.
Cloudflare caching: how it works in practice

Cloudflare is a CDN (Content Delivery Network) — a global network of edge servers that cache your content closer to users. But Cloudflare's default caching behavior surprises most developers.

What Cloudflare caches by default

Cloudflare only caches files with known static extensions (.js, .css, .png, .jpg, .woff2, etc.). It does not cache HTML, JSON, or API responses by default — even if you set Cache-Control headers on them.

# Check the CF-Cache-Status header to see what happened:
$ curl -sI https://yoursite.com/style.css | grep cf-cache-status
cf-cache-status: HIT          ← served from Cloudflare edge

$ curl -sI https://yoursite.com/api/data | grep cf-cache-status
cf-cache-status: DYNAMIC      ← passed through to origin (not cached)

Cache-Control interaction

Your Cache-Control headers still matter — but they interact with Cloudflare's rules:

  • Static file + max-age=3600: CF caches it at the edge AND the browser caches it
  • Static file + no-store: CF respects it — passes through to origin every time
  • API response + max-age=3600: browser caches it, but CF still shows DYNAMIC (not edge-cached)

Cache Rules

To override defaults, use Cache Rules (formerly Page Rules) in the Cloudflare dashboard. For example, you can tell CF to cache HTML pages, or to bypass cache for your admin panel.

Purging strategies

  • Versioned filenames (best): style.v3.css or app.abc123.js — new filename = new cache entry, no purge needed
  • Purge everything: clears all cached content globally — fast but blunt, causes a spike of origin requests
  • Purge by URL: surgically clear specific files — precise but tedious for many files
"I deployed but users still see the old site!" You pushed new CSS 20 minutes ago. You can see it in the source files on the server. But users report the old styles. Check the response headers: curl -sI https://yoursite.com/style.css | grep -i cf-cache-status. If it says HIT, Cloudflare is serving a cached copy. Either purge the cache in the Cloudflare dashboard, or — better yet — use versioned filenames so each deploy gets a fresh cache entry automatically.
Your CSS changes aren't showing up for users. The response has Cache-Control: max-age=86400. What should you do?
Why is cache invalidation considered one of the hardest problems in computing?
Your API endpoint returns Cache-Control: public, max-age=3600, but CF-Cache-Status shows DYNAMIC. Why isn't Cloudflare caching it?
What's the most reliable way to ensure users get fresh assets after a deploy when using Cloudflare?

📐 System Design Reference

System Design has been merged into this course so the practical sizing numbers live next to architecture decisions. Use this section for back-of-envelope checks while you're designing APIs, queues, caches, and storage plans.

Latency: memory = ns, SSD = us, network = ms
1 day: 86,400 seconds (about 100K)
1K QPS: about 86M requests/day
Latency ladder (what is fast vs slow)
OperationTypical latencyNotes
L1 cache~1 nsCPU local cache
RAM read~100 nsMain memory access
NVMe SSD read~20-100 usFast disk access
Datacenter RTT~0.5 msService-to-service network hop
Cross-region RTT~50-150 msUS East <-> US West or beyond

Mental model: if you add network hops, you add milliseconds. If you add CPU work, you usually add microseconds or nanoseconds.

Throughput and QPS cheat sheet
ConversionRule of thumb
1 Gbps~125 MB/s
10 Gbps~1.25 GB/s
1 day86,400 seconds (about 100K)
1 year~31.5M seconds
1K QPS~86M requests/day, ~31.5B requests/year
requests_per_day = qps * 86_400
peak_qps = avg_qps * 2  # or *3 for normal bursty traffic
Storage and object sizing
ItemTypical sizeNotes
User row (id, email, metadata)~200-500 bytesBefore indexes and DB overhead
Password hash (bcrypt)60 bytesFixed output size
Avatar image5-20 KBWeb-optimized thumbnail
1080p photo100-500 KBCompressed for web
UTF-8 text~1-3 bytes/charASCII mostly 1 byte

Always multiply by replication factor, index size, and retention window before final capacity decisions.

Scaling rules of thumb for first-pass planning
MetricTypical range
Read:Write ratio10:1 to 100:1
Peak:Average traffic2x to 3x (10x for viral spikes)
Healthy cache hit rate95% to 99%
DAU:MAU10% to 30%
storage_total = users * data_per_user * retention_days
capacity_with_headroom = required_capacity * 1.3  # 30% safety margin
Roughly how many requests per day is 1,000 QPS?
Which ordering is correct from fastest to slowest?
For normal non-viral traffic, what peak multiplier should you usually plan against?