Scaling Express: Performance, Caching, and Concurrency

Front
Back
Right
Left
Top
Bottom
FAST

From Working to Fast

Getting your Express API working is step one. Getting it to handle 10,000 requests per minute without falling over — that’s step two, and it’s where most developers stop reading tutorials. I’ve debugged APIs that worked perfectly in development and died under modest production load. The pattern is always the same: no caching, synchronous operations blocking the event loop, and a single Node.js process on a multi-core machine. This post fixes all three.
We’ll cover:
PROFILE
Optimize With Evidence

Profile First

The cardinal rule of performance: measure before you optimize. Optimizing the wrong thing wastes time and introduces complexity for zero gain.

Node.js Built-In Profiler

Copy to clipboard
# Start your app with the profiler active
node --inspect src/index.js

# Or with tsx for TypeScript
tsx --inspect src/index.ts

Then open Chrome and navigate to `chrome://inspect`. Click “Inspect” on your running process to open DevTools with CPU profiling and memory snapshots.

clinic.js — The Power Tool

For deeper analysis, clinic.js gives you flame graphs, event loop analysis, and heap allocation tracking with a single command.

Copy to clipboard
npm install --save-dev clinic autocannon

# Profile for 30 seconds under load
npx clinic doctor -- node dist/index.js

# Then in another terminal, hit it with load:
npx autocannon -c 100 -d 30 http://localhost:3000/api/users
Clinic generates an HTML report showing:
What to Look For
CACHING
The Highest-ROI Optimization

Caching

The cardinal rule of performance: measure before you optimize. Optimizing the wrong thing wastes time and introduces complexity for zero gain.

In-Memory Cache (Node-Cache)

For small, frequently-accessed datasets that don’t need to be shared across instances:
Copy to clipboard
// npm install node-cache

// src/config/cache.ts
import NodeCache from 'node-cache';

export const cache = new NodeCache({
  stdTTL: 300,       // Default 5 minutes TTL
  checkperiod: 60,   // Check for expired keys every 60 seconds
  useClones: false   // Better performance for read-only data
});
```

```typescript
// Cache middleware — wraps any route with caching
import { cache } from '../config/cache.js';
import { Request, Response, NextFunction } from 'express';

export function cacheMiddleware(ttlSeconds: number) {
  return (req: Request, res: Response, next: NextFunction) => {
    const key = `${req.method}:${req.originalUrl}`;
    const cached = cache.get(key);

    if (cached) {
      return res.json(cached);
    }

    // Override res.json to capture and cache the response
    const originalJson = res.json.bind(res);
    res.json = (data: any) => {
      if (res.statusCode === 200) {
        cache.set(key, data, ttlSeconds);
      }
      return originalJson(data);
    };

    next();
  };
}
Copy to clipboard
// Apply to specific routes
router.get('/products', cacheMiddleware(300), getProducts);      // Cache 5 min
router.get('/categories', cacheMiddleware(3600), getCategories); // Cache 1 hour

Redis Cache — Shared Across Instances

When you’re running multiple Express instances (clustered or horizontal scaling), in-memory cache doesn’t work — each instance has its own memory. Redis solves this.
Copy to clipboard
// npm install ioredis

// src/config/redis.ts
import Redis from 'ioredis';

export const redis = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: 3,
  enableReadyCheck: true,
  lazyConnect: true,
});

redis.on('error', (err) => {
  console.error('Redis error:', err);
});
Copy to clipboard
// Redis cache helper
export async function getCachedOrFetch<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttlSeconds = 300
): Promise<T> {
  const cached = await redis.get(key);

  if (cached) {
    return JSON.parse(cached) as T;
  }

  const data = await fetcher();
  await redis.setex(key, ttlSeconds, JSON.stringify(data));
Copy to clipboard
// In your service
export async function getProducts() {
  return getCachedOrFetch(
    'products:all',
    () => db.product.findMany({ where: { active: true } }),
    300
  );
}

HTTP Cache Headers — Let the CDN Do the Work

For public, cacheable API responses, add standard HTTP caching headers. CDNs (Cloudflare, Fastly) and browsers will cache these automatically:
Copy to clipboard
// Cache public product list for 5 minutes, allow stale for 1 more minute
router.get('/products', (req, res, next) => {
  res.set('Cache-Control', 'public, max-age=300, stale-while-revalidate=60');
  next();
}, getProducts);

// Never cache user-specific or sensitive endpoints
router.get('/profile', requireAuth, (req, res, next) => {
  res.set('Cache-Control', 'private, no-cache');
  next();
}, getProfile);
THREADS
Keep the Event Loop Clear

Worker Threads

Node.js runs on a single thread. If you perform a CPU-intensive operation (image processing, PDF generation, complex calculations) on the main thread, every other request waits. Worker threads let you offload CPU-heavy work to a separate thread while the event loop continues serving requests.

The Problem (Event Loop Blocking)

Copy to clipboard
// This blocks the event loop for the entire calculation duration
app.get('/fibonacci/:n', (req, res) => {
  const n = parseInt(req.params.n);
  const result = fibonacci(n); // If n=45, this takes ~10 seconds — blocks everything
  res.json({ result });
});

The Solution (Worker Thread)

Copy to clipboard
// src/workers/fibonacci.worker.ts
import { workerData, parentPort } from 'worker_threads';

function fibonacci(n: number): number {
  if (n <= 1) return n;
  return fibonacci(n - 1) + fibonacci(n - 2);
}

const result = fibonacci(workerData.n);
Copy to clipboard
// src/utils/runWorker.ts
import { Worker } from 'worker_threads';
import path from 'path';

export function runWorker(workerFile: string, data: unknown): Promise<unknown> {
  return new Promise((resolve, reject) => {
    const worker = new Worker(path.resolve(workerFile), { workerData: data });

    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
    });
  });
Copy to clipboard
// Event loop stays free while worker runs
app.get('/fibonacci/:n', async (req, res) => {
  const result = await runWorker('./dist/workers/fibonacci.worker.js', {
    n: parseInt(req.params.n)
  });
  res.json({ result });
});

Use worker threads for: Image/video processing, PDF generation, CSV parsing of large files, complex mathematical calculations, data encryption/decryption at scale. Don’t use worker threads for: I/O operations (database queries, HTTP requests) — those are already non-blocking via the event loop.

CPU CORE
Use Every CPU Core

Clustering With PM2

By default, a single Node.js process runs on **one CPU core**. If your server has 8 cores, you’re using 12.5% of its capacity.

"Node.js operates on a single-threaded event loop, which can limit CPU utilization on multi-core systems. To leverage all available CPU cores, use the cluster module or process managers like PM2."

GitHub Community — Node.js Performance Discussion

The cluster module spawns multiple worker processes that all share the same port. PM2 makes this painless.

PM2 Ecosystem Config

Copy to clipboard
// npm install --save pm2

// ecosystem.config.cjs
module.exports = {
  apps: [{
    name: 'my-express-api',
    script: './dist/index.js',
    instances: 'max',         // One process per CPU core
    exec_mode: 'cluster',     // Enable cluster mode
    watch: false,             // Don't watch files in production
    max_memory_restart: '500M', // Restart if memory exceeds 500MB
    env_production: {
      NODE_ENV: 'production',
      PORT: 3000
    }
  }]
};

Deploy

Copy to clipboard
# Build TypeScript first
npm run build

# Start in cluster mode
pm2 start ecosystem.config.cjs --env production

# Check status
pm2 status

# Monitor in real-time
pm2 monit

# Scale up/down without restart
pm2 scale my-express-api 4

# Enable auto-restart on server reboot
pm2 startup
pm2 save
Important
In cluster mode, you cannot use in-process state (variables, in-memory cache) because each worker process has its own memory. Use Redis for shared state across workers.
CLUSTERING
Without PM2

Manual Clustering

If you prefer the built-in approach:
Copy to clipboard
// src/cluster.ts
import cluster from 'cluster';
import os from 'os';
import { logger } from './config/logger.js';

const numCPUs = os.cpus().length;

if (cluster.isPrimary) {
  logger.info(`Primary process ${process.pid} starting ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    logger.warn(`Worker ${worker.process.pid} died (${signal || code}). Restarting...`);
    cluster.fork(); // Always restart a dead worker
  });
} else {
  // Worker: run the actual Express app
  import('./index.js');
  logger.info(`Worker ${process.pid} started`);
}
RATE LIMITING
Sliding Window by User, IP, and Endpoint

Rate Limiting

Rate limiting protects your API from abuse, brute-force attacks, and accidental runaway scripts. For production, use a sliding window counter in Redis — more precise than fixed windows and doesn’t suffer from the “burst at window boundary” problem.
Copy to clipboard
// npm install express-rate-limit rate-limit-redis ioredis

// src/middleware/rateLimiter.ts
import rateLimit from 'express-rate-limit';
import { RedisStore } from 'rate-limit-redis';
import { redis } from '../config/redis.js';
import { Request } from 'express';

// General API rate limiter
export const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 200,
  standardHeaders: 'draft-7', // Return RateLimit headers
  legacyHeaders: false,
  store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args) }),
  keyGenerator: (req: Request) => {
    // Use user ID if authenticated, otherwise fall back to IP
    return (req as any).user?.userId ?? req.ip ?? 'anonymous';
  },
  message: {
    type: 'https://api.yourapp.com/errors/rate-limited',
    title: 'Too Many Requests',
    status: 429,
    detail: 'You have exceeded the rate limit. Please try again later.',
  }
});

// Strict limiter for auth endpoints (prevent brute-force)
export const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10, // Only 10 attempts per 15 minutes
  store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args) }),
  keyGenerator: (req: Request) => `auth:${req.ip}`,
  message: {
    type: 'https://api.yourapp.com/errors/rate-limited',
    title: 'Too Many Login Attempts',
    status: 429,
    detail: 'Too many login attempts. Please wait 15 minutes.',
  }
});

// Per-endpoint limiter for expensive operations
export const exportLimiter = rateLimit({
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 5, // Only 5 exports per hour
  store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args) }),
  keyGenerator: (req: Request) => `export:${(req as any).user?.userId}`,
Copy to clipboard
// Apply in your routes:

import { apiLimiter, authLimiter, exportLimiter } from '../middleware/rateLimiter.js';

// Global API limit
app.use('/api', apiLimiter);

// Strict limit on auth routes
app.use('/api/auth', authLimiter);

// Expensive endpoint-specific limit
router.post('/reports/export', requireAuth, exportLimiter, generateExport);
CHECKLIST
Putting It All Together

Production Performance Checklist

Before your next production deployment, run through this list:

Explore project snapshots or discuss custom web solutions.

Premature optimization is the root of all evil. But failure to optimize at all is the root of a different kind of evil — unavailability.

Donald E. Knuth, Computer Programming as an Art

Thank You for Spending Your Valuable Time

I truly appreciate you taking the time to read blog. Your valuable time means a lot to me, and I hope you found the content insightful and engaging!
Front
Back
Right
Left
Top
Bottom
FAQ's

Frequently Asked Questions

Yes — in fact, Redis is *required* for shared state in a clustered environment. Each PM2 worker has its own memory, so sessions and cache must live in Redis to be accessible across workers.

Use worker threads for CPU-intensive computation within the same Node.js process — they share memory and are lighter weight. Use child processes (`child_process.spawn`) for running external programs or scripts outside of Node.js.

A fixed window resets at exact intervals (e.g., every 15 minutes). This allows bursting at the window boundary — a user could send 200 requests at 14:59 and another 200 at 15:01. A sliding window tracks the last N requests regardless of when the window started, eliminating this burst problem.

Each worker is a separate Node.js process with its own V8 heap. A typical Express API uses 50-200MB per worker. Set `max_memory_restart` in your PM2 config to automatically restart workers that exceed your limit and prevent memory leak accumulation.

No — each PM2 worker has its own in-memory cache instance. A cache set in worker A is invisible to worker B. For clustered apps, always use Redis for shared caching.

Comments are closed