Bablu Kumar Singh
Back to Blog
System Design
8 min read
June 2, 2026

System Design for Backend Engineers: A Practical Guide

System Design for Backend Engineers: A Practical Guide

System design is where backend engineering meets architecture. It is the discipline of making decisions that allow your application to serve 10 users today and 10 million users tomorrow — without a rewrite. This article covers the building blocks I use when designing scalable backend systems.

Horizontal vs. Vertical Scaling

  • Vertical scaling — add more CPU and RAM to a single server. Simple but has a ceiling.
  • Horizontal scaling — add more servers behind a load balancer. Unlimited ceiling but requires stateless application design.

The rule: design for horizontal scaling from day one. Store sessions in Redis, not in process memory. Store uploads in S3, not on local disk. This makes every instance interchangeable.

Load Balancing

A load balancer distributes incoming traffic across multiple application instances. Common algorithms:

AlgorithmHow It WorksBest For
Round RobinCycles through servers in orderEqual-capacity servers
Least ConnectionsRoutes to the server with fewest active connectionsVariable request durations
IP HashRoutes based on client IP hashSession affinity (sticky sessions)
Weighted Round RobinAssigns weights based on server capacityMixed-capacity clusters

In production, I use AWS Application Load Balancer (ALB) with health checks:

javascript Code Block
Target Group Health Check:
  Path:     /api/health
  Interval: 30 seconds
  Timeout:  5 seconds
  Healthy:  2 consecutive successes
  Unhealthy: 3 consecutive failures

The health endpoint is trivial but critical:

typescript Code Block
app.get('/api/health', (req, res) => {
  res.status(200).json({ status: 'ok', uptime: process.uptime() });
});

Database Replication

A single database server is a single point of failure. Use primary-replica replication:

  • Primary (Master) — handles all writes.
  • Replicas (Read Replicas) — handle read queries, reducing primary load.
javascript Code Block
       Writes
Client ───────> Primary DB
                  │
              Replication
              ┌───┴───┐
           Replica 1  Replica 2
              │          │
           Reads       Reads

In Node.js with Mongoose, route reads to replicas:

typescript Code Block
await mongoose.connect(MONGODB_URI, {
  readPreference: 'secondaryPreferred',
});

For critical reads that require the latest data, override per-query:

typescript Code Block
const user = await User.findById(userId).read('primary');

Caching Layers

Caching reduces database load and speeds up responses. I use a multi-layer caching strategy:

1. Application-level cache — in-process cache (e.g., node-cache) for config and rarely-changing data.

2. Distributed cache — Redis for data shared across instances (sessions, computed results).

3. CDN cache — Cloudflare or CloudFront for static assets and API responses with proper Cache-Control headers.

typescript Code Block
// Application-level cache for config that rarely changes
import NodeCache from 'node-cache';
const appCache = new NodeCache({ stdTTL: 300 }); // 5 minutes

function getFeatureFlags(): FeatureFlags {
  const cached = appCache.get<FeatureFlags>('featureFlags');
  if (cached) return cached;

  const flags = loadFeatureFlagsFromDB();
  appCache.set('featureFlags', flags);
  return flags;
}

Message Queues for Async Processing

Not every operation needs to complete before the API responds. Offload heavy work to background queues:

javascript Code Block
Client ──> API ──> Response (immediate)
                │
                └──> Message Queue ──> Worker
                                        │
                                    Heavy Work
                                  (emails, PDFs,
                                   analytics, etc.)

This pattern keeps P99 latencies low because the API only does the minimum synchronous work.

Rate Limiting

Protect your APIs from abuse with layered rate limiting:

  • Global rate limit — 1000 requests/minute per IP.
  • Endpoint-specific limits — 5 login attempts per minute per IP.
  • User-specific limits — 100 API calls per minute per API key.
typescript Code Block
import rateLimit from 'express-rate-limit';

const loginLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 5,
  message: { error: 'Too many login attempts. Try again in 1 minute.' },
});

router.post('/auth/login', loginLimiter, authController.login);

Circuit Breaker Pattern

When a downstream service fails, stop sending it traffic. The circuit breaker pattern prevents cascade failures:

  • Closed — requests flow normally.
  • Open — requests are rejected immediately (fast fail).
  • Half-Open — a limited number of test requests are allowed through.
typescript Code Block
// Simplified circuit breaker
class CircuitBreaker {
  private failures = 0;
  private readonly threshold = 5;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private nextAttempt = 0;

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() < this.nextAttempt) throw new Error('Circuit is open');
      this.state = 'half-open';
    }
    try {
      const result = await fn();
      this.reset();
      return result;
    } catch (err) {
      this.recordFailure();
      throw err;
    }
  }

  private recordFailure() {
    this.failures++;
    if (this.failures >= this.threshold) {
      this.state = 'open';
      this.nextAttempt = Date.now() + 30000; // 30 second cooldown
    }
  }

  private reset() {
    this.failures = 0;
    this.state = 'closed';
  }
}

Key Takeaways

  • Design stateless from day one — store state in Redis, S3, or the database.
  • Use load balancers with health checks for automatic failover.
  • Set up read replicas to scale database reads independently.
  • Cache at multiple layers — in-process, Redis, and CDN.
  • Offload heavy work to message queues to keep API latencies low.
  • Implement circuit breakers to prevent cascade failures across services.

System design is not about memorizing architectures — it is about understanding trade-offs and making intentional decisions for your specific scale and constraints.

#System Design#Architecture#Scalability#Load Balancing
Bablu Kumar Singh
Written by

Bablu Kumar Singh

Backend-Focused Full Stack Developer

Backend-Focused Full Stack Developer specializing in Node.js, MongoDB, PostgreSQL, Redis, RabbitMQ, AWS, Docker, System Design, and React Native.

You May Also Like

Building a Multi-Tenant SaaS: Database Architecture Strategies
System Design
8 min read

Building a Multi-Tenant SaaS: Database Architecture Strategies

An evaluation of pool database systems vs. schema-based and table-level tenant isolation patterns for microservices.

Jun 1, 2026Read
RabbitMQ Event Processing Patterns for Scalable Systems
System Design
8 min read

RabbitMQ Event Processing Patterns for Scalable Systems

A deep dive into event-driven architecture with RabbitMQ — covering exchanges, queues, dead-letter handling, retry strategies, and real-world patterns for Node.js microservices.

May 14, 2026Read
Optimizing PostgreSQL Query Performance: Indexing and Connection Pooling
Databases
6 min read

Optimizing PostgreSQL Query Performance: Indexing and Connection Pooling

Deep dive into database indexing strategies, explain analyze readouts, and why connection pooling is vital for high-concurrency Node.js microservices.

May 10, 2026Read