Web Architecture - Full Stack

🌐 HTTP Fundamentals

When you type a URL and hit Enter, this is what happens. Click any step to learn more.

What does an HTTP message actually look like?

HTTP is just text. Your browser sends:

GET /page HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 ...
Accept: text/html

The server responds:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1234

<html>
  <body>Hello, world!</body>
</html>

HTTP Methods (GET, POST, PUT, DELETE)

GET — Retrieve data (the default when you visit a URL)
POST — Submit data (forms, API calls)
PUT — Replace a resource
DELETE — Remove a resource
HEAD — Like GET, but only return headers

Status Codes (200, 404, 502...)

2xx — Success (200 OK, 201 Created)
3xx — Redirect (301 Moved, 302 Found)
4xx — Client error (400 Bad Request, 404 Not Found)
5xx — Server error (500 Internal, 502 Bad Gateway)

502 Bad Gateway = the reverse proxy couldn't reach your app. Usually means Flask isn't running.

Try It: Send a real HTTP request

Request

Click "Inspect" to see the HTTP request

Response

Response will appear here

What does "HTTP is stateless" mean?

A The server is turned off between requests B Each request is independent — the server doesn't remember previous ones C You can't send data with HTTP requests

You see a 502 error. What's most likely wrong?

A The page doesn't exist B You don't have permission C The application server (e.g., Flask) isn't running

A - B - C -

🔍 DNS Resolution

Before your browser can connect to a server, it needs an IP address. Domain names like example.com are for humans — computers route packets using numerical addresses like 93.184.216.34. The Domain Name System (DNS) translates between the two.

The lookup chain

When you visit example.com, the resolution goes through several layers of cache before hitting the network:

1. Browser cache       — "Did I look this up recently?"
2. OS cache            — "Has any app on this machine looked it up?"
3. Router cache        — "Has anyone on this network looked it up?"
4. ISP's DNS resolver  — "Has any ISP customer looked it up?"
5. Recursive query     — Walk the DNS tree: root → .com → example.com

This is why DNS changes are slow. Each layer caches the answer for the duration of the TTL (Time To Live). If you change your DNS records, old cached answers may persist for minutes to hours until the TTL expires at each layer.

Record types

A	Maps domain to IPv4 address	example.com → 93.184.216.34
AAAA	Maps domain to IPv6 address	example.com → 2606:2800:220:1:...
CNAME	Alias — points to another domain	www.example.com → example.com
MX	Mail server for the domain	example.com → mail.example.com
TXT	Arbitrary text (SPF, verification)	"v=spf1 include:_spf.google.com ~all"

Debugging DNS: dig, nslookup, and /etc/hosts

When DNS isn't behaving, these tools help:

# Query DNS directly (bypasses all caches)
$ dig example.com +short
93.184.216.34

# See the full resolution chain
$ dig example.com +trace

# Quick lookup
$ nslookup example.com
Server:  127.0.0.53
Address: 93.184.216.34

You can also override DNS locally with /etc/hosts:

# /etc/hosts — local overrides, checked before DNS
127.0.0.1    myapp.local
10.0.1.5     staging.myapp.com

This is useful for testing a new server before switching DNS publicly.

"My site still points to the old server!" You updated your A record 10 minutes ago. But your ISP's DNS resolver cached the old answer with a 1-hour TTL. Nothing you can do except wait — or lower the TTL before the migration so it expires faster.

What is the primary purpose of DNS?

A Encrypt traffic between browser and server B Translate domain names into IP addresses C Store website files on the server

You changed your DNS A record but the old IP still shows up. What's the most likely cause?

A The DNS registrar is down B The old answer is still cached — the TTL hasn't expired yet C You need to restart the web server for DNS to update

What type of DNS record creates an alias from one domain to another?

A A record B MX record C CNAME record

Cloudflare DNS: Proxy vs DNS-only

When you use Cloudflare as your DNS provider, each record gets a toggle: Proxied (orange cloud) or DNS-only (gray cloud). This single toggle changes everything about how traffic reaches your server.

Setting	What `dig` shows	What happens
☁️ Proxied	104.21.x.x (Cloudflare edge IP)	Traffic routes through Cloudflare — gets WAF, DDoS protection, caching, analytics
☁️ DNS-only	203.0.113.50 (your origin IP)	Cloudflare is just a nameserver — traffic goes directly to your server

Origin IP leaks. Even if your main A record is proxied, other records like MX (mail) are always DNS-only and expose your real server IP. Attackers use this to find origins and bypass Cloudflare. Check with: dig MX yourdomain.com +short. Ideally, use a separate IP or service for mail.

CNAME Flattening

The DNS spec says you cannot put a CNAME at the zone apex (the bare domain like example.com). Why? Because CNAME means "this name is an alias for that name" — but the apex must also have SOA and NS records, and CNAME can't coexist with other record types.

Cloudflare solves this with CNAME flattening: you create a CNAME at the root, but Cloudflare resolves it server-side and returns an A record to the querying resolver. The client never sees the CNAME — it just gets an IP address.

Try it: dig +short example.com — if a site uses Cloudflare with a proxied root, you'll see Cloudflare edge IPs instead of the origin. Compare with dig +short mail.example.com which might reveal the real server.

You run dig +short yoursite.com and see 104.21.48.200 — a Cloudflare IP, not your server's IP. What does this tell you?

A Your DNS is misconfigured and pointing to the wrong server B The record is proxied — traffic routes through Cloudflare before reaching your origin C Cloudflare has taken over your domain

Why can't you create a standard CNAME record at example.com (the zone apex)?

A CNAME records can't coexist with SOA/NS records, which must exist at the apex B CNAME records are only allowed on subdomains for security reasons C The root domain must always use an A record by law

🔌 TCP & Sockets

HTTP rides on top of TCP (Transmission Control Protocol). While HTTP defines the message format, TCP handles the actual delivery — ensuring bytes arrive in order, retransmitting lost packets, and managing connections.

A socket is the programming interface to TCP. When your Flask app listens on port 5000, it's creating a socket that waits for incoming TCP connections.

What is a port? A port is just a 16-bit number (0-65535) that identifies which application should receive incoming data. It's not a physical thing — it's like an apartment number in a building (the IP address).

When you run flask run --port 5000, here's what happens:

# Your app does (simplified):
import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('0.0.0.0', 5000))  # Claim port 5000
sock.listen()                  # Start accepting connections

while True:
    client, addr = sock.accept()  # Wait for a connection
    data = client.recv(1024)       # Read the HTTP request
    client.send(b'HTTP/1.1 200 OK\r\n\r\nHello')
    client.close()

The bind() call claims the port. If another process already has it, you get the dreaded error:

OSError: [Errno 98] Address already in use

This usually means:

Another instance of your app is running
The previous instance crashed but the OS hasn't released the port yet
Some other service is using that port

80	HTTP (unencrypted)
443	HTTPS (encrypted)
22	SSH
5432	PostgreSQL
5000-9999	Common range for development servers

Connection Lifecycle

TCP connections go through a handshake before data can flow:

Client                    Server
   |                         |
   |-------- SYN ----------->|  "I want to connect"
   |<------ SYN-ACK ---------|  "OK, I acknowledge"
   |-------- ACK ----------->|  "Great, let's go"
   |                         |
   |====== DATA FLOWS =======|
   |                         |
   |-------- FIN ----------->|  "I'm done"
   |<------ FIN-ACK ---------|  "OK, me too"

Keep-alive connections: HTTP/1.1 introduced persistent connections. Instead of closing after each request, the connection stays open for multiple requests. This avoids the overhead of repeated handshakes.

What is a port?

A A physical connector on the server B A number that identifies which application receives data C The server's IP address

"Address already in use" usually means:

A Another process is using that port B The IP address is wrong C The network is down

🛡️ Reverse Proxies

A reverse proxy sits between the internet and your application. Every production web app uses one — Apache, nginx, or a cloud load balancer. It's the front door that decides where each request goes.

What does it do?

TLS termination — handles HTTPS encryption so your app speaks plain HTTP internally
Request routing — sends /api to Flask, /images to the filesystem
Static file serving — serves CSS, JS, images directly without hitting your Python app
Load balancing — distributes requests across multiple app servers
Connection buffering — absorbs slow clients so your app workers stay free

A real Apache config

This is a simplified version of what a production Apache config looks like:

<VirtualHost *:443>
    ServerName myapp.example.com

    # TLS termination
    SSLEngine on
    SSLCertificateFile /etc/letsencrypt/live/myapp/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/myapp/privkey.pem

    # API requests → Flask app on port 9912
    ProxyPass /api http://127.0.0.1:9912/api
    ProxyPassReverse /api http://127.0.0.1:9912/api

    # Static files → filesystem (never hits Python)
    Alias /static /home/app_mysite/frontend/static
    <Directory /home/app_mysite/frontend/static>
        Require all granted
    </Directory>

    # Everything else → Flask
    ProxyPass / http://127.0.0.1:9912/
    ProxyPassReverse / http://127.0.0.1:9912/
</VirtualHost>

Order matters! Alias directives are checked before ProxyPass. If you put the catch-all ProxyPass / first, it will match everything and your Alias for static files will never be reached.

Forwarded headers

Your app sits behind the proxy, so it sees 127.0.0.1 as the client IP — not the real user. The proxy adds headers to pass along the original information:

X-Forwarded-For: 203.0.113.50      # Real client IP
X-Forwarded-Proto: https            # Original protocol
X-Forwarded-Host: myapp.example.com # Original hostname

Flask's ProxyFix middleware reads these headers so request.remote_addr and request.url reflect reality:

from werkzeug.middleware.proxy_fix import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)

502 Bad Gateway debugging checklist

When you see 502, work through this list:

# 1. Is the app process running?
$ sudo systemctl status myapp
● myapp.service - My Web App
   Active: inactive (dead)     ← Not running!

# 2. Check the app logs for crash reason
$ sudo journalctl -u myapp -n 50

# 3. Is it listening on the right port?
$ ss -tlnp | grep 9912
LISTEN  0  128  127.0.0.1:9912  *:*  users:(("gunicorn",pid=1234))

# 4. Does the proxy config point to the right port?
$ grep ProxyPass /etc/apache2/sites-enabled/*.conf

What does "TLS termination" at the reverse proxy mean?

A The proxy disables encryption entirely B The proxy handles HTTPS — the app receives plain HTTP internally C The proxy stops TLS connections from reaching the server

What does a 502 Bad Gateway error indicate?

A The requested URL doesn't exist B The reverse proxy couldn't reach the application server C The SSL certificate has expired

Your Flask app logs show every request coming from 127.0.0.1 instead of real user IPs. What's missing?

A The DNS is misconfigured B The app needs ProxyFix middleware to read X-Forwarded-For headers C The firewall is blocking external connections

Cloudflare: a reverse proxy in the cloud

Cloudflare is itself a reverse proxy — it sits between users and your server, just like Apache sits between the internet and Flask. With Cloudflare proxied DNS, the full chain looks like this:

User's Browser
    ↓ HTTPS
Cloudflare Edge (nearest PoP)
    ↓ HTTPS (or HTTP, depending on SSL mode)
Your Server: Apache
    ↓ HTTP (localhost)
Gunicorn / Flask

This means you have a double-proxy chain, and each proxy adds its own forwarded headers:

Header	Set by	Contains
CF-Connecting-IP	Cloudflare	The real user's IP address (most reliable)
X-Forwarded-For	Both proxies	Chain: `user-ip, cloudflare-ip`
CF-RAY	Cloudflare	Unique request ID + datacenter code (e.g., `7a1b2c3d-IAD`)
CF-IPCountry	Cloudflare	Two-letter country code of the user (e.g., `US`)

Double-proxy X-Forwarded-For trap. With ProxyFix(app.wsgi_app, x_for=1), Flask reads the last IP in X-Forwarded-For — which is Cloudflare's IP, not the user's. You need x_for=2 to skip past Cloudflare, or better yet, read CF-Connecting-IP directly via request.headers.get('CF-Connecting-IP').

CF-RAY for debugging. When a user reports "your site showed an error," ask them for the CF-RAY ID from their response headers. You can search your Cloudflare dashboard by Ray ID to find the exact request, its status code, and whether it was served from cache or hit your origin.

Your Flask app is behind both Cloudflare and Apache. With ProxyFix(x_for=1), request.remote_addr shows a Cloudflare IP instead of the user's real IP. Why?

A Cloudflare strips the user's IP for privacy B x_for=1 reads the last X-Forwarded-For entry, which is Cloudflare's IP in a double-proxy chain C Apache doesn't forward the X-Forwarded-For header from Cloudflare

⚙️ WSGI & Gunicorn

WSGI (Web Server Gateway Interface) is the standard that connects Python web frameworks to web servers. It's an interface — a contract that says "give me an environ dict and a callback, and I'll give you a response."

Every Python web framework (Flask, Django, FastAPI via ASGI) implements this interface. Every production Python web server (Gunicorn, uWSGI) knows how to call it.

The WSGI interface

At its core, WSGI is just a function with a specific signature:

def application(environ, start_response):
    """
    environ: dict with HTTP_HOST, REQUEST_METHOD, PATH_INFO, etc.
    start_response: callback to set status and headers
    """
    status = '200 OK'
    headers = [('Content-Type', 'text/html')]
    start_response(status, headers)
    return [b'<h1>Hello, World!</h1>']

Flask wraps this — when you write @app.route decorators, Flask builds the application callable for you. But under the hood, every request goes through this interface.

Why Gunicorn?

Never use flask run in production. Flask's development server handles one request at a time, has no process management, and wasn't built for reliability. It exists for development only.

Gunicorn is a production WSGI server. It pre-forks multiple worker processes, each capable of handling requests independently:

# Development (single process, auto-reload)
$ flask run --port 5000

# Production (4 worker processes, managed)
$ gunicorn --workers 4 --bind 127.0.0.1:9912 main:app

The key Gunicorn options:

--workers N — number of worker processes (typically 2 * CPU_cores + 1)
--bind HOST:PORT — address to listen on (use 127.0.0.1 behind a proxy)
--timeout 30 — kill workers that take longer than this
--worker-class sync — synchronous workers (default, simplest)

Sync vs async workers

Sync workers (default) handle one request at a time per worker. Simple and predictable. Good for CPU-bound work or apps with fast responses.

Async workers (gevent, eventlet) use green threads to handle many requests per worker concurrently. Better for I/O-bound work (waiting on databases, external APIs).

# Sync: 4 workers = 4 concurrent requests max
$ gunicorn --workers 4 main:app

# Async: 4 workers × 1000 connections each
$ gunicorn --workers 4 --worker-class gevent \
    --worker-connections 1000 main:app

Start with sync workers. Switch to async only when you measure a bottleneck.

What is WSGI?

A A web browser for Linux servers B A standard interface between Python web frameworks and web servers C A JavaScript build tool for web applications

Why shouldn't you use flask run in production?

A Flask's dev server is single-process, unreliable, and not built for production load B Flask doesn't support HTTPS C Flask requires a commercial license for production use

🔄 Flask Request Lifecycle

When a request arrives at Flask, it goes through a specific sequence of steps. Understanding this lifecycle helps you know where to put authentication checks, logging, database connections, and error handling.

The lifecycle

Request arrives
    ↓
1. URL routing — match the path to a view function
    ↓
2. @before_request hooks — run before every request
    ↓
3. View function — your code runs
    ↓
4. @after_request hooks — modify the response
    ↓
5. Response sent back

Request context

Inside a request, Flask provides thread-local objects that are available anywhere in your code:

from flask import request, g, session

@app.before_request
def load_user():
    # g is a per-request namespace — dies after the response
    g.user = get_user_from_token(request.headers.get('Authorization'))

@app.route('/profile')
def profile():
    # request — the incoming HTTP request
    page = request.args.get('page', 1)

    # session — encrypted cookie data that persists across requests
    session['last_page'] = '/profile'

    # g.user — set in before_request
    return render_template('profile.html', user=g.user)

The g object is your request-scoped scratch pad. Put database connections, parsed auth tokens, or computed values here. It's created fresh for each request and thrown away after the response — never use it to store data between requests.

Error handlers

Flask lets you register custom error pages:

@app.errorhandler(404)
def not_found(e):
    return render_template('404.html'), 404

@app.errorhandler(500)
def server_error(e):
    # Log the error, notify your team
    app.logger.error(f'Server error: {e}')
    return render_template('500.html'), 500

Common gotcha: @after_request vs @teardown_request

@after_request runs after a successful response and receives the response object. Use it to add headers, CORS, or modify the response.

@teardown_request runs always, even if an exception occurred. Use it for cleanup like closing database connections.

@app.after_request
def add_security_headers(response):
    response.headers['X-Frame-Options'] = 'SAMEORIGIN'
    return response  # Must return the response!

@app.teardown_request
def close_db(exception):
    db = g.pop('db', None)
    if db is not None:
        db.close()

What is Flask's g object used for?

A Storing data that persists across all requests (like a global variable) B A per-request namespace for temporary data like auth info or DB connections C A way to generate HTML templates

Where should you put an authentication check that runs on every request?

A At the top of every view function B In a @before_request hook C In the @after_request hook

🏭 Processes & Workers

A web server needs to handle multiple requests at the same time. If one user's request takes 2 seconds (waiting on a database), you can't make everyone else wait. The solution: multiple worker processes.

The fork() model

Gunicorn uses the pre-fork model. A master process starts, then creates (forks) worker processes. Each worker is a complete copy of your application:

Master process (PID 1000)
  ├── Worker 1 (PID 1001) — handling request from User A
  ├── Worker 2 (PID 1002) — handling request from User B
  ├── Worker 3 (PID 1003) — idle, waiting
  └── Worker 4 (PID 1004) — handling request from User C

The master doesn't handle requests — it manages workers. If a worker crashes, the master spawns a replacement. If a worker takes too long, the master kills it.

How many workers?

The rule of thumb: workers = 2 * CPU_cores + 1. On a 2-core machine, that's 5 workers. This accounts for time spent waiting on I/O (database, filesystem) — while one worker waits, another can use the CPU.

Processes vs threads

Python has the GIL (Global Interpreter Lock) — only one thread can execute Python code at a time per process. This means threads don't help with CPU-bound work, but they do help with I/O-bound work (waiting on network/database).

	Processes	Threads
Memory	Separate (each worker = full copy)	Shared (lighter weight)
GIL impact	No impact (each has its own GIL)	Limits CPU parallelism
Crash isolation	One crash doesn't affect others	One crash kills the whole process
Best for	CPU-bound, reliability	I/O-bound, memory efficiency

Connection pooling basics

Opening a database connection is expensive (~50ms for PostgreSQL). If every request opens a new connection, that's 50ms of overhead before any work starts.

Connection pooling maintains a set of pre-opened connections. Workers borrow a connection, use it, and return it:

# Without pooling: 50ms overhead per request
conn = psycopg2.connect(...)  # Expensive!
cursor = conn.cursor()
cursor.execute('SELECT ...')
conn.close()

# With pooling: connections are reused
from sqlalchemy import create_engine
engine = create_engine('postgresql://...', pool_size=5)
# Connections are borrowed from pool and returned automatically

With 4 workers and a pool of 5 connections each, you have 20 database connections. Make sure your database allows at least that many (max_connections in PostgreSQL).

Why does Gunicorn use multiple processes instead of just multiple threads?

A Python's GIL prevents true CPU parallelism with threads B Python doesn't support threads C Processes use less memory than threads

How many Gunicorn workers should you run on a 2-core machine?

A 2 (one per core) B 5 (2 * cores + 1) C 100 (more is always better)

In Gunicorn's pre-fork model, what does the master process do?

A Handles the most important requests B Manages workers — spawning, monitoring, and killing them as needed C Serves static files while workers handle dynamic requests

🔄 Concurrent Jobs & Live Streaming

Some work does not fit inside a normal request/response cycle. Video encoding, AI generation, large imports, and batch reports can take 10-120 seconds. The right model is: accept request quickly, run work in background, stream progress events.

Reference architecture

This pattern lets many users run jobs in parallel without blocking request handlers:

Critical rule: never keep a request open for the whole job. Return 202 Accepted with a job_id quickly, then stream progress on a separate endpoint.

Reconnect behavior

When the connection drops, the browser reconnects and sends Last-Event-ID so the server can replay missed events.

Backend lifecycle (pseudocode)

Keep request handling short. Push real work onto a worker pool or queue:

# POST /api/jobs
user = require_auth()
payload = validate(request.body)
job_id = create_job(user_id=user.id, status="queued")
emit(job_id, kind="queued", percent=0)
queue.push(job_id)
return 202 { job_id, status: "queued" }

# Worker loop (N workers running in parallel)
while true:
    job_id = queue.pop()
    mark_running(job_id)
    for step in build_execution_plan(job_id):
        run(step)
        emit(job_id, kind="progress", percent=step.percent)
    mark_done(job_id)
    emit(job_id, kind="done", percent=100)

SSE stream endpoint (pseudocode)

Server-Sent Events is ideal for one-way progress streams (server → browser). Every event gets an ID so reconnect can resume from Last-Event-ID.

# GET /api/jobs/:id/stream
user = require_auth()
job = load_job(job_id)
if not job or job.user_id != user.id:
    return 404

cursor = request.headers["Last-Event-ID"] or -1
stream "retry: 1500"

while job_not_finished(job_id) OR unseen_events_exist(job_id, cursor):
    events = load_events_after(job_id, cursor)
    for event in events:
        stream id/event/data(event)
        cursor = event.seq
    sleep(250ms)

Frontend stream handling (pseudocode)

job = POST /api/jobs(payload)
stream = EventSource("/api/jobs/{job.id}/stream")

on progress(event):
    render_progress(event.percent, event.message)

on done(event):
    render_done(job.id)
    stream.close()

on error(event):
    keep_progress_ui_visible()
    # SSE retries automatically

Concurrency controls for many users

Per-user authorization: every stream/read endpoint must verify job.user_id == current_user.
Backpressure: cap queue size and return 429 or 503 when overloaded.
Rate limits: enforce max active jobs per user to prevent abuse.
Persistence: store job state/events in Redis or DB so worker restarts do not lose progress.
Cleanup: expire old jobs/events to avoid unbounded memory growth.

Node event loop vs Gunicorn workers (which is better?)

They are different trade-offs, not winner/loser architectures:

	Node.js (event loop)	Gunicorn (pre-fork workers)
Default unit	Single process, single JS thread	Multiple OS processes
I/O concurrency	Excellent via non-blocking async events	Handled by multiple workers/threads
CPU-heavy tasks	Blocks event loop unless offloaded	Parallelized across worker processes
Memory sharing	Easy inside one process only	No shared heap between workers
Scale across cores	Usually multiple processes (cluster/containers)	Already process-based

Important: once you scale either stack across processes, in-memory state is no longer global. Put shared job state in Redis/DB/message queue, not in process memory.

For long-running jobs with streamed output, both stacks should use the same design: POST /jobs returns quickly, workers do the heavy work, and SSE/WebSocket streams progress from a shared event store.

SSE vs WebSockets vs Polling

SSE: simplest for server-to-client progress logs, auto-reconnect built in.
WebSocket: use when client must send frequent live control messages (pause/resume/live chat).
Polling: easiest to run anywhere but higher latency and repeated overhead.

For "start job + stream logs" flows, SSE is usually the fastest path to production.

Important: they are not mutually exclusive at the system level. You can use SSE for primary streaming and keep polling as a fallback path.

Pattern: SSE primary + polling fallback

For long-running jobs, a robust design uses both:

1) Client starts job: POST /api/jobs -> 202 + job_id
2) Worker runs job and appends events: {seq, kind, message, pct}
3) Client opens SSE: GET /api/jobs/:id/stream
4) On disconnect, SSE reconnects with Last-Event-ID
5) If SSE fails repeatedly, client polls:
   GET /api/jobs/:id/events?after=last_seq

Backend endpoints:

POST /api/jobs returns quickly with job_id.
GET /api/jobs/:id/stream streams live events (SSE).
GET /api/jobs/:id/events?after=<seq> returns missed events for polling fallback.

Client behavior pseudocode:

open_sse(job_id)
on_event(evt): render(evt); cursor = evt.seq
on_sse_error():
    if reconnecting_too_long:
        every 1s:
            events = GET /api/jobs/:id/events?after=cursor
            render(events)
            cursor = max_seq(events)
        keep_retrying_sse_in_background()

Production checklist for job streaming

Job create endpoint returns in < 200ms with job_id.
Worker pool size and queue depth are explicit config values.
Each streamed event has a monotonically increasing event ID.
Streams support resume using Last-Event-ID.
A polling fallback endpoint exists for networks/proxies that break SSE.
Proxy buffering is disabled for stream endpoints.
Users can cancel jobs (POST /api/jobs/:id/cancel).
Metrics exist: queue wait time, run time, failure rate, active streams.

Why should POST /api/jobs return quickly with 202 Accepted instead of waiting for completion?

A Browsers cannot display responses that take longer than 1 second B It prevents request timeouts and allows workers to process many users' jobs concurrently C SSE only works if the create endpoint returns status 200

What is the most important check on GET /api/jobs/:id/stream in a multi-user app?

A Verify the caller owns the job before sending any events B Require the job ID to be at least 12 characters long C Only allow one stream per server process

A client disconnects mid-job and reconnects. How do you avoid losing log lines?

A Restart the job from the beginning B Use event IDs and resume from Last-Event-ID C Switch every request to long-polling

⚖️ Service Boundaries: Split or Merge?

Splitting services too early creates operational overhead. Splitting too late creates scaling and ownership bottlenecks. Use concrete signals instead of instincts.

1. Do different teams own different parts of this system?

2. Do parts need independent scaling?

3. Do change rates differ across domains?

4. Could one domain be useful without the other?

5. Are data models heavily shared?

6. Do you need separate release cadences?

Keep Together Split

Verdict

Answer all questions to get a recommendation.

Common service-boundary patterns

Pattern 1: Shared SDK over multiple services

Keep services separate internally, but expose one clean client API externally. This preserves independent scaling/deployment without forcing frontend complexity.

// One client, multiple services underneath
Platform.init({
  auth: { appId: 'full-stack-courses' },
  feedback: { projectName: 'fullstack' }
});

Pattern 2: Backend-for-Frontend (BFF)

Frontend makes one call; backend orchestrates multiple services and returns one response shape.

POST /api/submit-feedback
1) Validate user identity
2) Call dependent services
3) Return unified response

Pattern 3: Strangler migration

Start merged, then extract boundaries only where pain is real (team bottlenecks, scaling asymmetry, or dependency blast radius).

The best architecture is the one you can change. Start simpler than your architecture diagram suggests. Split where you have evidence, not anxiety.

Which combination is the strongest signal to split a service boundary?

A One team owns everything and components always scale together B Different teams own domains and they have different scaling needs C The codebase is larger than 5,000 lines

What does the strangler pattern recommend?

A Rewrite everything into microservices in one release B Keep one monolith forever, regardless of scaling pain C Extract boundaries gradually as real pressure appears

⚡ Caching

Caching stores the result of an expensive operation so you can skip the work next time. It happens at every layer of the stack — from the browser to the database. Understanding where caches live (and how to bust them) is essential for debugging "why aren't my changes showing up?"

The caching layers

Browser cache         ← closest to user, fastest
    ↓
CDN cache             ← edge servers around the world
    ↓
Reverse proxy cache   ← at your server's front door
    ↓
Application cache     ← Redis, in-memory dicts
    ↓
Database cache        ← query cache, buffer pool

Browser cache (Cache-Control)

The server tells the browser how long to cache a response using the Cache-Control header:

# "Cache this for 1 hour"
Cache-Control: max-age=3600

# "Cache, but check with server before reusing"
Cache-Control: no-cache

# "Never cache this"
Cache-Control: no-store

# "Cache for 1 year — this URL is versioned"
Cache-Control: public, max-age=31536000, immutable

"My CSS changes aren't showing up!" If your static files have long cache lifetimes, browsers will keep serving the old version. Solutions: add a version query param (styles.css?v=2), use content hashing in filenames (styles.a1b2c3.css), or use Cache-Control: no-cache during development.

Application cache (Redis)

For expensive computations or frequently-accessed data, store results in an in-memory cache:

import redis
cache = redis.Redis()

def get_user_profile(user_id):
    # Check cache first
    cached = cache.get(f'profile:{user_id}')
    if cached:
        return json.loads(cached)

    # Expensive database query
    profile = db.query('SELECT * FROM users WHERE id = ?', user_id)

    # Store in cache for 5 minutes
    cache.setex(f'profile:{user_id}', 300, json.dumps(profile))
    return profile

Cache invalidation

"There are only two hard things in computer science: cache invalidation and naming things." — Phil Karlton

The hard part of caching isn't adding it — it's knowing when to throw away stale data. Common strategies:

TTL (Time To Live) — cache expires after N seconds. Simple but may serve stale data.
Write-through — update the cache whenever the data changes. Consistent but complex.
Cache-aside — only cache on read. Delete from cache on write, refill on next read.

Cloudflare caching: how it works in practice

Cloudflare is a CDN (Content Delivery Network) — a global network of edge servers that cache your content closer to users. But Cloudflare's default caching behavior surprises most developers.

What Cloudflare caches by default

Cloudflare only caches files with known static extensions (.js, .css, .png, .jpg, .woff2, etc.). It does not cache HTML, JSON, or API responses by default — even if you set Cache-Control headers on them.

# Check the CF-Cache-Status header to see what happened:
$ curl -sI https://yoursite.com/style.css | grep cf-cache-status
cf-cache-status: HIT          ← served from Cloudflare edge

$ curl -sI https://yoursite.com/api/data | grep cf-cache-status
cf-cache-status: DYNAMIC      ← passed through to origin (not cached)

Cache-Control interaction

Your Cache-Control headers still matter — but they interact with Cloudflare's rules:

Static file + max-age=3600: CF caches it at the edge AND the browser caches it
Static file + no-store: CF respects it — passes through to origin every time
API response + max-age=3600: browser caches it, but CF still shows DYNAMIC (not edge-cached)

Cache Rules

To override defaults, use Cache Rules (formerly Page Rules) in the Cloudflare dashboard. For example, you can tell CF to cache HTML pages, or to bypass cache for your admin panel.

Purging strategies

Versioned filenames (best): style.v3.css or app.abc123.js — new filename = new cache entry, no purge needed
Purge everything: clears all cached content globally — fast but blunt, causes a spike of origin requests
Purge by URL: surgically clear specific files — precise but tedious for many files

"I deployed but users still see the old site!" You pushed new CSS 20 minutes ago. You can see it in the source files on the server. But users report the old styles. Check the response headers: curl -sI https://yoursite.com/style.css | grep -i cf-cache-status. If it says HIT, Cloudflare is serving a cached copy. Either purge the cache in the Cloudflare dashboard, or — better yet — use versioned filenames so each deploy gets a fresh cache entry automatically.

Your CSS changes aren't showing up for users. The response has Cache-Control: max-age=86400. What should you do?

A Restart the web server B Add a version parameter to the CSS URL (e.g., styles.css?v=2) to bust the cache C Delete the CSS file and recreate it

Why is cache invalidation considered one of the hardest problems in computing?

A Caches use too much memory B It's hard to know when cached data is stale and ensure all layers get the update C Most programming languages don't support caching

Your API endpoint returns Cache-Control: public, max-age=3600, but CF-Cache-Status shows DYNAMIC. Why isn't Cloudflare caching it?

A The max-age value is too low for Cloudflare to cache B Cloudflare only caches known static file extensions by default — API/JSON responses are passed through C You need to add CDN-Cache-Control instead of regular Cache-Control

What's the most reliable way to ensure users get fresh assets after a deploy when using Cloudflare?

A Click "Purge Everything" in the Cloudflare dashboard after each deploy B Set Cache-Control: no-store on all assets C Use versioned filenames so each deploy creates new cache entries automatically

📐 System Design Reference

System Design has been merged into this course so the practical sizing numbers live next to architecture decisions. Use this section for back-of-envelope checks while you're designing APIs, queues, caches, and storage plans.

Latency: memory = ns, SSD = us, network = ms

1 day: 86,400 seconds (about 100K)

1K QPS: about 86M requests/day

Latency ladder (what is fast vs slow)

Operation	Typical latency	Notes
L1 cache	~1 ns	CPU local cache
RAM read	~100 ns	Main memory access
NVMe SSD read	~20-100 us	Fast disk access
Datacenter RTT	~0.5 ms	Service-to-service network hop
Cross-region RTT	~50-150 ms	US East <-> US West or beyond

Mental model: if you add network hops, you add milliseconds. If you add CPU work, you usually add microseconds or nanoseconds.

Throughput and QPS cheat sheet

Conversion	Rule of thumb
1 Gbps	~125 MB/s
10 Gbps	~1.25 GB/s
1 day	86,400 seconds (about 100K)
1 year	~31.5M seconds
1K QPS	~86M requests/day, ~31.5B requests/year

requests_per_day = qps * 86_400
peak_qps = avg_qps * 2  # or *3 for normal bursty traffic

Storage and object sizing

Item	Typical size	Notes
User row (id, email, metadata)	~200-500 bytes	Before indexes and DB overhead
Password hash (bcrypt)	60 bytes	Fixed output size
Avatar image	5-20 KB	Web-optimized thumbnail
1080p photo	100-500 KB	Compressed for web
UTF-8 text	~1-3 bytes/char	ASCII mostly 1 byte

Always multiply by replication factor, index size, and retention window before final capacity decisions.

Scaling rules of thumb for first-pass planning

Metric	Typical range
Read:Write ratio	10:1 to 100:1
Peak:Average traffic	2x to 3x (10x for viral spikes)
Healthy cache hit rate	95% to 99%
DAU:MAU	10% to 30%

storage_total = users * data_per_user * retention_days
capacity_with_headroom = required_capacity * 1.3  # 30% safety margin

Roughly how many requests per day is 1,000 QPS?

A About 8.6 million B About 86 million C About 860 million

Which ordering is correct from fastest to slowest?

A Network RTT - SSD - RAM B RAM - SSD - network RTT C SSD - network RTT - RAM

For normal non-viral traffic, what peak multiplier should you usually plan against?

A 1.05x average traffic B 2x to 3x average traffic C 20x to 30x average traffic

Create Account