🌐 HTTP Fundamentals
When you type a URL and hit Enter, this is what happens. Click any step to learn more.
🔍 DNS Resolution
Before your browser can connect to a server, it needs an IP address. Domain names like example.com are for humans — computers route packets using numerical addresses like 93.184.216.34. The Domain Name System (DNS) translates between the two.
The lookup chain
When you visit example.com, the resolution goes through several layers of cache before hitting the network:
1. Browser cache — "Did I look this up recently?"
2. OS cache — "Has any app on this machine looked it up?"
3. Router cache — "Has anyone on this network looked it up?"
4. ISP's DNS resolver — "Has any ISP customer looked it up?"
5. Recursive query — Walk the DNS tree: root → .com → example.com
Record types
dig +short yoursite.com and see 104.21.48.200 — a Cloudflare IP, not your server's IP. What does this tell you?example.com (the zone apex)?🔌 TCP & Sockets
HTTP rides on top of TCP (Transmission Control Protocol). While HTTP defines the message format, TCP handles the actual delivery — ensuring bytes arrive in order, retransmitting lost packets, and managing connections.
A socket is the programming interface to TCP. When your Flask app listens on port 5000, it's creating a socket that waits for incoming TCP connections.
When you run flask run --port 5000, here's what happens:
# Your app does (simplified):
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('0.0.0.0', 5000)) # Claim port 5000
sock.listen() # Start accepting connections
while True:
client, addr = sock.accept() # Wait for a connection
data = client.recv(1024) # Read the HTTP request
client.send(b'HTTP/1.1 200 OK\r\n\r\nHello')
client.close()
The bind() call claims the port. If another process already has it, you get the dreaded error:
OSError: [Errno 98] Address already in use
This usually means:
- Another instance of your app is running
- The previous instance crashed but the OS hasn't released the port yet
- Some other service is using that port
Connection Lifecycle
TCP connections go through a handshake before data can flow:
Client Server
| |
|-------- SYN ----------->| "I want to connect"
|<------ SYN-ACK ---------| "OK, I acknowledge"
|-------- ACK ----------->| "Great, let's go"
| |
|====== DATA FLOWS =======|
| |
|-------- FIN ----------->| "I'm done"
|<------ FIN-ACK ---------| "OK, me too"
🛡️ Reverse Proxies
A reverse proxy sits between the internet and your application. Every production web app uses one — Apache, nginx, or a cloud load balancer. It's the front door that decides where each request goes.
What does it do?
- TLS termination — handles HTTPS encryption so your app speaks plain HTTP internally
- Request routing — sends
/apito Flask,/imagesto the filesystem - Static file serving — serves CSS, JS, images directly without hitting your Python app
- Load balancing — distributes requests across multiple app servers
- Connection buffering — absorbs slow clients so your app workers stay free
A real Apache config
This is a simplified version of what a production Apache config looks like:
<VirtualHost *:443>
ServerName myapp.example.com
# TLS termination
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/myapp/fullchain.pem
SSLCertificateKeyFile /etc/letsencrypt/live/myapp/privkey.pem
# API requests → Flask app on port 9912
ProxyPass /api http://127.0.0.1:9912/api
ProxyPassReverse /api http://127.0.0.1:9912/api
# Static files → filesystem (never hits Python)
Alias /static /home/app_mysite/frontend/static
<Directory /home/app_mysite/frontend/static>
Require all granted
</Directory>
# Everything else → Flask
ProxyPass / http://127.0.0.1:9912/
ProxyPassReverse / http://127.0.0.1:9912/
</VirtualHost>
Alias directives are checked before ProxyPass. If you put the catch-all ProxyPass / first, it will match everything and your Alias for static files will never be reached.
Forwarded headers
Your app sits behind the proxy, so it sees 127.0.0.1 as the client IP — not the real user. The proxy adds headers to pass along the original information:
X-Forwarded-For: 203.0.113.50 # Real client IP
X-Forwarded-Proto: https # Original protocol
X-Forwarded-Host: myapp.example.com # Original hostname
Flask's ProxyFix middleware reads these headers so request.remote_addr and request.url reflect reality:
from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)
ProxyFix(x_for=1), request.remote_addr shows a Cloudflare IP instead of the user's real IP. Why?⚙️ WSGI & Gunicorn
WSGI (Web Server Gateway Interface) is the standard that connects Python web frameworks to web servers. It's an interface — a contract that says "give me an environ dict and a callback, and I'll give you a response."
Every Python web framework (Flask, Django, FastAPI via ASGI) implements this interface. Every production Python web server (Gunicorn, uWSGI) knows how to call it.
The WSGI interface
At its core, WSGI is just a function with a specific signature:
def application(environ, start_response):
"""
environ: dict with HTTP_HOST, REQUEST_METHOD, PATH_INFO, etc.
start_response: callback to set status and headers
"""
status = '200 OK'
headers = [('Content-Type', 'text/html')]
start_response(status, headers)
return [b'<h1>Hello, World!</h1>']
Flask wraps this — when you write @app.route decorators, Flask builds the application callable for you. But under the hood, every request goes through this interface.
Why Gunicorn?
flask run in production. Flask's development server handles one request at a time, has no process management, and wasn't built for reliability. It exists for development only.
Gunicorn is a production WSGI server. It pre-forks multiple worker processes, each capable of handling requests independently:
# Development (single process, auto-reload)
$ flask run --port 5000
# Production (4 worker processes, managed)
$ gunicorn --workers 4 --bind 127.0.0.1:9912 main:app
The key Gunicorn options:
--workers N— number of worker processes (typically2 * CPU_cores + 1)--bind HOST:PORT— address to listen on (use127.0.0.1behind a proxy)--timeout 30— kill workers that take longer than this--worker-class sync— synchronous workers (default, simplest)
flask run in production?🔄 Flask Request Lifecycle
When a request arrives at Flask, it goes through a specific sequence of steps. Understanding this lifecycle helps you know where to put authentication checks, logging, database connections, and error handling.
The lifecycle
Request arrives
↓
1. URL routing — match the path to a view function
↓
2. @before_request hooks — run before every request
↓
3. View function — your code runs
↓
4. @after_request hooks — modify the response
↓
5. Response sent back
Request context
Inside a request, Flask provides thread-local objects that are available anywhere in your code:
from flask import request, g, session
@app.before_request
def load_user():
# g is a per-request namespace — dies after the response
g.user = get_user_from_token(request.headers.get('Authorization'))
@app.route('/profile')
def profile():
# request — the incoming HTTP request
page = request.args.get('page', 1)
# session — encrypted cookie data that persists across requests
session['last_page'] = '/profile'
# g.user — set in before_request
return render_template('profile.html', user=g.user)
g object is your request-scoped scratch pad. Put database connections, parsed auth tokens, or computed values here. It's created fresh for each request and thrown away after the response — never use it to store data between requests.
Error handlers
Flask lets you register custom error pages:
@app.errorhandler(404)
def not_found(e):
return render_template('404.html'), 404
@app.errorhandler(500)
def server_error(e):
# Log the error, notify your team
app.logger.error(f'Server error: {e}')
return render_template('500.html'), 500
g object used for?🏭 Processes & Workers
A web server needs to handle multiple requests at the same time. If one user's request takes 2 seconds (waiting on a database), you can't make everyone else wait. The solution: multiple worker processes.
The fork() model
Gunicorn uses the pre-fork model. A master process starts, then creates (forks) worker processes. Each worker is a complete copy of your application:
Master process (PID 1000)
├── Worker 1 (PID 1001) — handling request from User A
├── Worker 2 (PID 1002) — handling request from User B
├── Worker 3 (PID 1003) — idle, waiting
└── Worker 4 (PID 1004) — handling request from User C
The master doesn't handle requests — it manages workers. If a worker crashes, the master spawns a replacement. If a worker takes too long, the master kills it.
How many workers?
workers = 2 * CPU_cores + 1. On a 2-core machine, that's 5 workers. This accounts for time spent waiting on I/O (database, filesystem) — while one worker waits, another can use the CPU.
Processes vs threads
Python has the GIL (Global Interpreter Lock) — only one thread can execute Python code at a time per process. This means threads don't help with CPU-bound work, but they do help with I/O-bound work (waiting on network/database).
🔄 Concurrent Jobs & Live Streaming
Some work does not fit inside a normal request/response cycle. Video encoding, AI generation, large imports, and batch reports can take 10-120 seconds. The right model is: accept request quickly, run work in background, stream progress events.
Reference architecture
This pattern lets many users run jobs in parallel without blocking request handlers:
202 Accepted with a job_id quickly, then stream progress on a separate endpoint.
Reconnect behavior
When the connection drops, the browser reconnects and sends Last-Event-ID so the server can replay missed events.
Backend lifecycle (pseudocode)
Keep request handling short. Push real work onto a worker pool or queue:
# POST /api/jobs
user = require_auth()
payload = validate(request.body)
job_id = create_job(user_id=user.id, status="queued")
emit(job_id, kind="queued", percent=0)
queue.push(job_id)
return 202 { job_id, status: "queued" }
# Worker loop (N workers running in parallel)
while true:
job_id = queue.pop()
mark_running(job_id)
for step in build_execution_plan(job_id):
run(step)
emit(job_id, kind="progress", percent=step.percent)
mark_done(job_id)
emit(job_id, kind="done", percent=100)
SSE stream endpoint (pseudocode)
Server-Sent Events is ideal for one-way progress streams (server → browser). Every event gets an ID so reconnect can resume from Last-Event-ID.
# GET /api/jobs/:id/stream
user = require_auth()
job = load_job(job_id)
if not job or job.user_id != user.id:
return 404
cursor = request.headers["Last-Event-ID"] or -1
stream "retry: 1500"
while job_not_finished(job_id) OR unseen_events_exist(job_id, cursor):
events = load_events_after(job_id, cursor)
for event in events:
stream id/event/data(event)
cursor = event.seq
sleep(250ms)
Frontend stream handling (pseudocode)
job = POST /api/jobs(payload)
stream = EventSource("/api/jobs/{job.id}/stream")
on progress(event):
render_progress(event.percent, event.message)
on done(event):
render_done(job.id)
stream.close()
on error(event):
keep_progress_ui_visible()
# SSE retries automatically
Concurrency controls for many users
- Per-user authorization: every stream/read endpoint must verify
job.user_id == current_user. - Backpressure: cap queue size and return
429or503when overloaded. - Rate limits: enforce max active jobs per user to prevent abuse.
- Persistence: store job state/events in Redis or DB so worker restarts do not lose progress.
- Cleanup: expire old jobs/events to avoid unbounded memory growth.
POST /api/jobs return quickly with 202 Accepted instead of waiting for completion?GET /api/jobs/:id/stream in a multi-user app?⚖️ Service Boundaries: Split or Merge?
Splitting services too early creates operational overhead. Splitting too late creates scaling and ownership bottlenecks. Use concrete signals instead of instincts.
Common service-boundary patterns
Pattern 1: Shared SDK over multiple services
Keep services separate internally, but expose one clean client API externally. This preserves independent scaling/deployment without forcing frontend complexity.
// One client, multiple services underneath
Platform.init({
auth: { appId: 'full-stack-courses' },
feedback: { projectName: 'fullstack' }
});
Pattern 2: Backend-for-Frontend (BFF)
Frontend makes one call; backend orchestrates multiple services and returns one response shape.
POST /api/submit-feedback
1) Validate user identity
2) Call dependent services
3) Return unified response
Pattern 3: Strangler migration
Start merged, then extract boundaries only where pain is real (team bottlenecks, scaling asymmetry, or dependency blast radius).
⚡ Caching
Caching stores the result of an expensive operation so you can skip the work next time. It happens at every layer of the stack — from the browser to the database. Understanding where caches live (and how to bust them) is essential for debugging "why aren't my changes showing up?"
The caching layers
Browser cache ← closest to user, fastest
↓
CDN cache ← edge servers around the world
↓
Reverse proxy cache ← at your server's front door
↓
Application cache ← Redis, in-memory dicts
↓
Database cache ← query cache, buffer pool
Browser cache (Cache-Control)
The server tells the browser how long to cache a response using the Cache-Control header:
# "Cache this for 1 hour"
Cache-Control: max-age=3600
# "Cache, but check with server before reusing"
Cache-Control: no-cache
# "Never cache this"
Cache-Control: no-store
# "Cache for 1 year — this URL is versioned"
Cache-Control: public, max-age=31536000, immutable
styles.css?v=2), use content hashing in filenames (styles.a1b2c3.css), or use Cache-Control: no-cache during development.
Application cache (Redis)
For expensive computations or frequently-accessed data, store results in an in-memory cache:
import redis
cache = redis.Redis()
def get_user_profile(user_id):
# Check cache first
cached = cache.get(f'profile:{user_id}')
if cached:
return json.loads(cached)
# Expensive database query
profile = db.query('SELECT * FROM users WHERE id = ?', user_id)
# Store in cache for 5 minutes
cache.setex(f'profile:{user_id}', 300, json.dumps(profile))
return profile
Cache invalidation
The hard part of caching isn't adding it — it's knowing when to throw away stale data. Common strategies:
- TTL (Time To Live) — cache expires after N seconds. Simple but may serve stale data.
- Write-through — update the cache whenever the data changes. Consistent but complex.
- Cache-aside — only cache on read. Delete from cache on write, refill on next read.
curl -sI https://yoursite.com/style.css | grep -i cf-cache-status. If it says HIT, Cloudflare is serving a cached copy. Either purge the cache in the Cloudflare dashboard, or — better yet — use versioned filenames so each deploy gets a fresh cache entry automatically.
Cache-Control: max-age=86400. What should you do?Cache-Control: public, max-age=3600, but CF-Cache-Status shows DYNAMIC. Why isn't Cloudflare caching it?📐 System Design Reference
System Design has been merged into this course so the practical sizing numbers live next to architecture decisions. Use this section for back-of-envelope checks while you're designing APIs, queues, caches, and storage plans.
1,000 QPS?