System Design for Backend Engineers: A Practical Guide

System design is where backend engineering meets architecture. It is the discipline of making decisions that allow your application to serve 10 users today and 10 million users tomorrow — without a rewrite. This article covers the building blocks I use when designing scalable backend systems.

Horizontal vs. Vertical Scaling

Vertical scaling — add more CPU and RAM to a single server. Simple but has a ceiling.
Horizontal scaling — add more servers behind a load balancer. Unlimited ceiling but requires stateless application design.

The rule: design for horizontal scaling from day one. Store sessions in Redis, not in process memory. Store uploads in S3, not on local disk. This makes every instance interchangeable.

Load Balancing

A load balancer distributes incoming traffic across multiple application instances. Common algorithms:

Algorithm	How It Works	Best For
Round Robin	Cycles through servers in order	Equal-capacity servers
Least Connections	Routes to the server with fewest active connections	Variable request durations
IP Hash	Routes based on client IP hash	Session affinity (sticky sessions)
Weighted Round Robin	Assigns weights based on server capacity	Mixed-capacity clusters

In production, I use AWS Application Load Balancer (ALB) with health checks:

javascript Code Block

Target Group Health Check:
  Path:     /api/health
  Interval: 30 seconds
  Timeout:  5 seconds
  Healthy:  2 consecutive successes
  Unhealthy: 3 consecutive failures

The health endpoint is trivial but critical:

typescript Code Block

app.get('/api/health', (req, res) => {
  res.status(200).json({ status: 'ok', uptime: process.uptime() });
});

Database Replication

A single database server is a single point of failure. Use primary-replica replication:

Primary (Master) — handles all writes.
Replicas (Read Replicas) — handle read queries, reducing primary load.

javascript Code Block

       Writes
Client ───────> Primary DB
                  │
              Replication
              ┌───┴───┐
           Replica 1  Replica 2
              │          │
           Reads       Reads

In Node.js with Mongoose, route reads to replicas:

typescript Code Block

await mongoose.connect(MONGODB_URI, {
  readPreference: 'secondaryPreferred',
});

For critical reads that require the latest data, override per-query:

typescript Code Block

const user = await User.findById(userId).read('primary');

Caching Layers

Caching reduces database load and speeds up responses. I use a multi-layer caching strategy:

1. Application-level cache — in-process cache (e.g., node-cache) for config and rarely-changing data.

2. Distributed cache — Redis for data shared across instances (sessions, computed results).

3. CDN cache — Cloudflare or CloudFront for static assets and API responses with proper Cache-Control headers.

typescript Code Block

// Application-level cache for config that rarely changes
import NodeCache from 'node-cache';
const appCache = new NodeCache({ stdTTL: 300 }); // 5 minutes

function getFeatureFlags(): FeatureFlags {
  const cached = appCache.get<FeatureFlags>('featureFlags');
  if (cached) return cached;

  const flags = loadFeatureFlagsFromDB();
  appCache.set('featureFlags', flags);
  return flags;
}

Message Queues for Async Processing

Not every operation needs to complete before the API responds. Offload heavy work to background queues:

javascript Code Block

Client ──> API ──> Response (immediate)
                │
                └──> Message Queue ──> Worker
                                        │
                                    Heavy Work
                                  (emails, PDFs,
                                   analytics, etc.)

This pattern keeps P99 latencies low because the API only does the minimum synchronous work.

Rate Limiting

Protect your APIs from abuse with layered rate limiting:

Global rate limit — 1000 requests/minute per IP.
Endpoint-specific limits — 5 login attempts per minute per IP.
User-specific limits — 100 API calls per minute per API key.

typescript Code Block

import rateLimit from 'express-rate-limit';

const loginLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 5,
  message: { error: 'Too many login attempts. Try again in 1 minute.' },
});

router.post('/auth/login', loginLimiter, authController.login);

Circuit Breaker Pattern

When a downstream service fails, stop sending it traffic. The circuit breaker pattern prevents cascade failures:

Closed — requests flow normally.
Open — requests are rejected immediately (fast fail).
Half-Open — a limited number of test requests are allowed through.

typescript Code Block

// Simplified circuit breaker
class CircuitBreaker {
  private failures = 0;
  private readonly threshold = 5;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private nextAttempt = 0;

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() < this.nextAttempt) throw new Error('Circuit is open');
      this.state = 'half-open';
    }
    try {
      const result = await fn();
      this.reset();
      return result;
    } catch (err) {
      this.recordFailure();
      throw err;
    }
  }

  private recordFailure() {
    this.failures++;
    if (this.failures >= this.threshold) {
      this.state = 'open';
      this.nextAttempt = Date.now() + 30000; // 30 second cooldown
    }
  }

  private reset() {
    this.failures = 0;
    this.state = 'closed';
  }
}

Key Takeaways

Design stateless from day one — store state in Redis, S3, or the database.
Use load balancers with health checks for automatic failover.
Set up read replicas to scale database reads independently.
Cache at multiple layers — in-process, Redis, and CDN.
Offload heavy work to message queues to keep API latencies low.
Implement circuit breakers to prevent cascade failures across services.

System design is not about memorizing architectures — it is about understanding trade-offs and making intentional decisions for your specific scale and constraints.

Horizontal vs. Vertical Scaling

Load Balancing

Database Replication

Caching Layers

Message Queues for Async Processing

Rate Limiting

Circuit Breaker Pattern

Key Takeaways

Bablu Kumar Singh

You May Also Like

Building a Multi-Tenant SaaS: Database Architecture Strategies

RabbitMQ Event Processing Patterns for Scalable Systems

Optimizing PostgreSQL Query Performance: Indexing and Connection Pooling