System design is where backend engineering meets architecture. It is the discipline of making decisions that allow your application to serve 10 users today and 10 million users tomorrow — without a rewrite. This article covers the building blocks I use when designing scalable backend systems.
Horizontal vs. Vertical Scaling
- Vertical scaling — add more CPU and RAM to a single server. Simple but has a ceiling.
- Horizontal scaling — add more servers behind a load balancer. Unlimited ceiling but requires stateless application design.
The rule: design for horizontal scaling from day one. Store sessions in Redis, not in process memory. Store uploads in S3, not on local disk. This makes every instance interchangeable.
Load Balancing
A load balancer distributes incoming traffic across multiple application instances. Common algorithms:
| Algorithm | How It Works | Best For |
|---|---|---|
| Round Robin | Cycles through servers in order | Equal-capacity servers |
| Least Connections | Routes to the server with fewest active connections | Variable request durations |
| IP Hash | Routes based on client IP hash | Session affinity (sticky sessions) |
| Weighted Round Robin | Assigns weights based on server capacity | Mixed-capacity clusters |
In production, I use AWS Application Load Balancer (ALB) with health checks:
Target Group Health Check:
Path: /api/health
Interval: 30 seconds
Timeout: 5 seconds
Healthy: 2 consecutive successes
Unhealthy: 3 consecutive failuresThe health endpoint is trivial but critical:
app.get('/api/health', (req, res) => {
res.status(200).json({ status: 'ok', uptime: process.uptime() });
});Database Replication
A single database server is a single point of failure. Use primary-replica replication:
- Primary (Master) — handles all writes.
- Replicas (Read Replicas) — handle read queries, reducing primary load.
Writes
Client ───────> Primary DB
│
Replication
┌───┴───┐
Replica 1 Replica 2
│ │
Reads ReadsIn Node.js with Mongoose, route reads to replicas:
await mongoose.connect(MONGODB_URI, {
readPreference: 'secondaryPreferred',
});For critical reads that require the latest data, override per-query:
const user = await User.findById(userId).read('primary');Caching Layers
Caching reduces database load and speeds up responses. I use a multi-layer caching strategy:
1. Application-level cache — in-process cache (e.g., node-cache) for config and rarely-changing data.
2. Distributed cache — Redis for data shared across instances (sessions, computed results).
3. CDN cache — Cloudflare or CloudFront for static assets and API responses with proper Cache-Control headers.
// Application-level cache for config that rarely changes
import NodeCache from 'node-cache';
const appCache = new NodeCache({ stdTTL: 300 }); // 5 minutes
function getFeatureFlags(): FeatureFlags {
const cached = appCache.get<FeatureFlags>('featureFlags');
if (cached) return cached;
const flags = loadFeatureFlagsFromDB();
appCache.set('featureFlags', flags);
return flags;
}Message Queues for Async Processing
Not every operation needs to complete before the API responds. Offload heavy work to background queues:
Client ──> API ──> Response (immediate)
│
└──> Message Queue ──> Worker
│
Heavy Work
(emails, PDFs,
analytics, etc.)This pattern keeps P99 latencies low because the API only does the minimum synchronous work.
Rate Limiting
Protect your APIs from abuse with layered rate limiting:
- Global rate limit — 1000 requests/minute per IP.
- Endpoint-specific limits — 5 login attempts per minute per IP.
- User-specific limits — 100 API calls per minute per API key.
import rateLimit from 'express-rate-limit';
const loginLimiter = rateLimit({
windowMs: 60 * 1000,
max: 5,
message: { error: 'Too many login attempts. Try again in 1 minute.' },
});
router.post('/auth/login', loginLimiter, authController.login);Circuit Breaker Pattern
When a downstream service fails, stop sending it traffic. The circuit breaker pattern prevents cascade failures:
- Closed — requests flow normally.
- Open — requests are rejected immediately (fast fail).
- Half-Open — a limited number of test requests are allowed through.
// Simplified circuit breaker
class CircuitBreaker {
private failures = 0;
private readonly threshold = 5;
private state: 'closed' | 'open' | 'half-open' = 'closed';
private nextAttempt = 0;
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() < this.nextAttempt) throw new Error('Circuit is open');
this.state = 'half-open';
}
try {
const result = await fn();
this.reset();
return result;
} catch (err) {
this.recordFailure();
throw err;
}
}
private recordFailure() {
this.failures++;
if (this.failures >= this.threshold) {
this.state = 'open';
this.nextAttempt = Date.now() + 30000; // 30 second cooldown
}
}
private reset() {
this.failures = 0;
this.state = 'closed';
}
}Key Takeaways
- Design stateless from day one — store state in Redis, S3, or the database.
- Use load balancers with health checks for automatic failover.
- Set up read replicas to scale database reads independently.
- Cache at multiple layers — in-process, Redis, and CDN.
- Offload heavy work to message queues to keep API latencies low.
- Implement circuit breakers to prevent cascade failures across services.
System design is not about memorizing architectures — it is about understanding trade-offs and making intentional decisions for your specific scale and constraints.
