What is the difference between a fixed window and a sliding window rate limit?

A fixed window resets at exact intervals, allowing potential bursts at the boundary. A sliding window tracks the last N requests regardless of when the window started, eliminating burst problems.

Does clustering increase memory usage?

Yes. Each PM2 worker is a separate Node.js process with its own V8 heap. It is recommended to set a max_memory_restart in your PM2 config to manage this.

Scaling Express.js: Performance, Caching & PM2

Q: Can I use PM2 clustering and Redis together?

Yes. In fact, Redis is required for shared state in a clustered environment. Each PM2 worker has its own memory, so sessions and cache must live in Redis to be accessible across workers.

by Sanjewa May 1, 2026 Express Js

FAST

From Working to Fast

Getting your Express API working is step one. Getting it to handle 10,000 requests per minute without falling over — that’s step two, and it’s where most developers stop reading tutorials. I’ve debugged APIs that worked perfectly in development and died under modest production load. The pattern is always the same: no caching, synchronous operations blocking the event loop, and a single Node.js process on a multi-core machine. This post fixes all three.

We’ll cover:

Profiling — find where your time actually goes before optimizing blindly
Caching — in-memory, Redis, and HTTP cache headers
Worker threads — offloading CPU-heavy work off the event loop
Clustering with PM2 — using all your CPU cores, not just one
Rate limiting — sliding window by user, IP, and endpoint

PROFILE

Optimize With Evidence

Profile First

The cardinal rule of performance: measure before you optimize. Optimizing the wrong thing wastes time and introduces complexity for zero gain.

Node.js Built-In Profiler

# Start your app with the profiler active
node --inspect src/index.js

# Or with tsx for TypeScript
tsx --inspect src/index.ts

Then open Chrome and navigate to `chrome://inspect`. Click “Inspect” on your running process to open DevTools with CPU profiling and memory snapshots.

clinic.js — The Power Tool

For deeper analysis, clinic.js gives you flame graphs, event loop analysis, and heap allocation tracking with a single command.

npm install --save-dev clinic autocannon

# Profile for 30 seconds under load
npx clinic doctor -- node dist/index.js

# Then in another terminal, hit it with load:
npx autocannon -c 100 -d 30 http://localhost:3000/api/users

Clinic generates an HTML report showing:

What to Look For

CACHING

The Highest-ROI Optimization

Caching

The cardinal rule of performance: measure before you optimize. Optimizing the wrong thing wastes time and introduces complexity for zero gain.

In-Memory Cache (Node-Cache)

For small, frequently-accessed datasets that don’t need to be shared across instances:

// npm install node-cache

// src/config/cache.ts
import NodeCache from 'node-cache';

export const cache = new NodeCache({
  stdTTL: 300,       // Default 5 minutes TTL
  checkperiod: 60,   // Check for expired keys every 60 seconds
  useClones: false   // Better performance for read-only data
});
```

```typescript
// Cache middleware — wraps any route with caching
import { cache } from '../config/cache.js';
import { Request, Response, NextFunction } from 'express';

export function cacheMiddleware(ttlSeconds: number) {
  return (req: Request, res: Response, next: NextFunction) => {
    const key = `${req.method}:${req.originalUrl}`;
    const cached = cache.get(key);

    if (cached) {
      return res.json(cached);
    }

    // Override res.json to capture and cache the response
    const originalJson = res.json.bind(res);
    res.json = (data: any) => {
      if (res.statusCode === 200) {
        cache.set(key, data, ttlSeconds);
      }
      return originalJson(data);
    };

    next();
  };
}

// Apply to specific routes
router.get('/products', cacheMiddleware(300), getProducts);      // Cache 5 min
router.get('/categories', cacheMiddleware(3600), getCategories); // Cache 1 hour

Redis Cache — Shared Across Instances

When you’re running multiple Express instances (clustered or horizontal scaling), in-memory cache doesn’t work — each instance has its own memory. Redis solves this.

// npm install ioredis

// src/config/redis.ts
import Redis from 'ioredis';

export const redis = new Redis(process.env.REDIS_URL!, {
  maxRetriesPerRequest: 3,
  enableReadyCheck: true,
  lazyConnect: true,
});

redis.on('error', (err) => {
  console.error('Redis error:', err);
});

// Redis cache helper
export async function getCachedOrFetch<T>(
  key: string,
  fetcher: () => Promise<T>,
  ttlSeconds = 300
): Promise<T> {
  const cached = await redis.get(key);

  if (cached) {
    return JSON.parse(cached) as T;
  }

  const data = await fetcher();
  await redis.setex(key, ttlSeconds, JSON.stringify(data));

// In your service
export async function getProducts() {
  return getCachedOrFetch(
    'products:all',
    () => db.product.findMany({ where: { active: true } }),
    300
  );
}

HTTP Cache Headers — Let the CDN Do the Work

For public, cacheable API responses, add standard HTTP caching headers. CDNs (Cloudflare, Fastly) and browsers will cache these automatically:

// Cache public product list for 5 minutes, allow stale for 1 more minute
router.get('/products', (req, res, next) => {
  res.set('Cache-Control', 'public, max-age=300, stale-while-revalidate=60');
  next();
}, getProducts);

// Never cache user-specific or sensitive endpoints
router.get('/profile', requireAuth, (req, res, next) => {
  res.set('Cache-Control', 'private, no-cache');
  next();
}, getProfile);

THREADS

Keep the Event Loop Clear

Worker Threads

Node.js runs on a single thread. If you perform a CPU-intensive operation (image processing, PDF generation, complex calculations) on the main thread, every other request waits. Worker threads let you offload CPU-heavy work to a separate thread while the event loop continues serving requests.

The Problem (Event Loop Blocking)

// This blocks the event loop for the entire calculation duration
app.get('/fibonacci/:n', (req, res) => {
  const n = parseInt(req.params.n);
  const result = fibonacci(n); // If n=45, this takes ~10 seconds — blocks everything
  res.json({ result });
});

The Solution (Worker Thread)

// src/workers/fibonacci.worker.ts
import { workerData, parentPort } from 'worker_threads';

function fibonacci(n: number): number {
  if (n <= 1) return n;
  return fibonacci(n - 1) + fibonacci(n - 2);
}

const result = fibonacci(workerData.n);

// src/utils/runWorker.ts
import { Worker } from 'worker_threads';
import path from 'path';

export function runWorker(workerFile: string, data: unknown): Promise<unknown> {
  return new Promise((resolve, reject) => {
    const worker = new Worker(path.resolve(workerFile), { workerData: data });

    worker.on('message', resolve);
    worker.on('error', reject);
    worker.on('exit', (code) => {
      if (code !== 0) reject(new Error(`Worker exited with code ${code}`));
    });
  });

// Event loop stays free while worker runs
app.get('/fibonacci/:n', async (req, res) => {
  const result = await runWorker('./dist/workers/fibonacci.worker.js', {
    n: parseInt(req.params.n)
  });
  res.json({ result });
});

Use worker threads for: Image/video processing, PDF generation, CSV parsing of large files, complex mathematical calculations, data encryption/decryption at scale. Don’t use worker threads for: I/O operations (database queries, HTTP requests) — those are already non-blocking via the event loop.

CPU CORE

Use Every CPU Core

Clustering With PM2

By default, a single Node.js process runs on **one CPU core**. If your server has 8 cores, you’re using 12.5% of its capacity.

"Node.js operates on a single-threaded event loop, which can limit CPU utilization on multi-core systems. To leverage all available CPU cores, use the cluster module or process managers like PM2."

GitHub Community — Node.js Performance Discussion

The cluster module spawns multiple worker processes that all share the same port. PM2 makes this painless.

PM2 Ecosystem Config

// npm install --save pm2

// ecosystem.config.cjs
module.exports = {
  apps: [{
    name: 'my-express-api',
    script: './dist/index.js',
    instances: 'max',         // One process per CPU core
    exec_mode: 'cluster',     // Enable cluster mode
    watch: false,             // Don't watch files in production
    max_memory_restart: '500M', // Restart if memory exceeds 500MB
    env_production: {
      NODE_ENV: 'production',
      PORT: 3000
    }
  }]
};

Deploy

# Build TypeScript first
npm run build

# Start in cluster mode
pm2 start ecosystem.config.cjs --env production

# Check status
pm2 status

# Monitor in real-time
pm2 monit

# Scale up/down without restart
pm2 scale my-express-api 4

# Enable auto-restart on server reboot
pm2 startup
pm2 save

Important

In cluster mode, you cannot use in-process state (variables, in-memory cache) because each worker process has its own memory. Use Redis for shared state across workers.

CLUSTERING

Without PM2

Manual Clustering

If you prefer the built-in approach:

// src/cluster.ts
import cluster from 'cluster';
import os from 'os';
import { logger } from './config/logger.js';

const numCPUs = os.cpus().length;

if (cluster.isPrimary) {
  logger.info(`Primary process ${process.pid} starting ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    logger.warn(`Worker ${worker.process.pid} died (${signal || code}). Restarting...`);
    cluster.fork(); // Always restart a dead worker
  });
} else {
  // Worker: run the actual Express app
  import('./index.js');
  logger.info(`Worker ${process.pid} started`);
}

RATE LIMITING

Sliding Window by User, IP, and Endpoint

Rate Limiting

Rate limiting protects your API from abuse, brute-force attacks, and accidental runaway scripts. For production, use a sliding window counter in Redis — more precise than fixed windows and doesn’t suffer from the “burst at window boundary” problem.

// npm install express-rate-limit rate-limit-redis ioredis

// src/middleware/rateLimiter.ts
import rateLimit from 'express-rate-limit';
import { RedisStore } from 'rate-limit-redis';
import { redis } from '../config/redis.js';
import { Request } from 'express';

// General API rate limiter
export const apiLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,  // 15 minutes
  max: 200,
  standardHeaders: 'draft-7', // Return RateLimit headers
  legacyHeaders: false,
  store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args) }),
  keyGenerator: (req: Request) => {
    // Use user ID if authenticated, otherwise fall back to IP
    return (req as any).user?.userId ?? req.ip ?? 'anonymous';
  },
  message: {
    type: 'https://api.yourapp.com/errors/rate-limited',
    title: 'Too Many Requests',
    status: 429,
    detail: 'You have exceeded the rate limit. Please try again later.',
  }
});

// Strict limiter for auth endpoints (prevent brute-force)
export const authLimiter = rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 10, // Only 10 attempts per 15 minutes
  store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args) }),
  keyGenerator: (req: Request) => `auth:${req.ip}`,
  message: {
    type: 'https://api.yourapp.com/errors/rate-limited',
    title: 'Too Many Login Attempts',
    status: 429,
    detail: 'Too many login attempts. Please wait 15 minutes.',
  }
});

// Per-endpoint limiter for expensive operations
export const exportLimiter = rateLimit({
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 5, // Only 5 exports per hour
  store: new RedisStore({ sendCommand: (...args: string[]) => redis.call(...args) }),
  keyGenerator: (req: Request) => `export:${(req as any).user?.userId}`,

// Apply in your routes:

import { apiLimiter, authLimiter, exportLimiter } from '../middleware/rateLimiter.js';

// Global API limit
app.use('/api', apiLimiter);

// Strict limit on auth routes
app.use('/api/auth', authLimiter);

// Expensive endpoint-specific limit
router.post('/reports/export', requireAuth, exportLimiter, generateExport);

CHECKLIST

Putting It All Together

Production Performance Checklist

Before your next production deployment, run through this list:

[ ] Profile first — use `clinic.js` to identify actual bottlenecks
[ ] Cache hot data — Redis for shared state, in-memory for single-instance
[ ] Add HTTP cache headers for public endpoints
[ ] Offload CPU work to worker threads (image processing, PDF generation)
[ ] Enable PM2 clustering with `instances: 'max'`
[ ] Enable compression (`npm install compression` — reduces response size 70-80%)
[ ] Rate limit by user, IP, and endpoint with Redis-backed sliding window
[ ] Use connection pooling for all database connections
[ ] Never block the event loop with synchronous operations

Explore project snapshots or discuss custom web solutions.

More About Me

Premature optimization is the root of all evil. But failure to optimize at all is the root of a different kind of evil — unavailability.

Donald E. Knuth, Computer Programming as an Art

Thank You for Spending Your Valuable Time

I truly appreciate you taking the time to read blog. Your valuable time means a lot to me, and I hope you found the content insightful and engaging!

FAQ's

Frequently Asked Questions

Does PM2 clustering work with Redis sessions?

Yes — in fact, Redis is *required* for shared state in a clustered environment. Each PM2 worker has its own memory, so sessions and cache must live in Redis to be accessible across workers.

When should I use worker threads vs child processes?

Use worker threads for CPU-intensive computation within the same Node.js process — they share memory and are lighter weight. Use child processes (`child_process.spawn`) for running external programs or scripts outside of Node.js.

What's the difference between a fixed-window and sliding-window rate limiter?

A fixed window resets at exact intervals (e.g., every 15 minutes). This allows bursting at the window boundary — a user could send 200 requests at 14:59 and another 200 at 15:01. A sliding window tracks the last N requests regardless of when the window started, eliminating this burst problem.

How much memory does each PM2 worker use?

Each worker is a separate Node.js process with its own V8 heap. A typical Express API uses 50-200MB per worker. Set `max_memory_restart` in your PM2 config to automatically restart workers that exceed your limit and prevent memory leak accumulation.

Is `node-cache` safe in a clustered environment?

No — each PM2 worker has its own in-memory cache instance. A cache set in worker A is invisible to worker B. For clustered apps, always use Redis for shared caching.

Blogs

Related Blogs

Express Js

01 May,2026 By Sanjewa

Shopping cart

Scaling Express: Performance, Caching, and Concurrency

From Working to Fast

Profile First

Node.js Built-In Profiler

clinic.js — The Power Tool

What to Look For

Caching

In-Memory Cache (Node-Cache)

Redis Cache — Shared Across Instances

HTTP Cache Headers — Let the CDN Do the Work

Worker Threads

The Problem (Event Loop Blocking)

The Solution (Worker Thread)

Clustering With PM2

PM2 Ecosystem Config

Deploy

Important

Manual Clustering

Rate Limiting

Production Performance Checklist

Explore project snapshots or discuss custom web solutions.

Thank You for Spending Your Valuable Time

I truly appreciate you taking the time to read blog. Your valuable time means a lot to me, and I hope you found the content insightful and engaging!

Frequently Asked Questions

Related Blogs

Scaling Express: Performance, Caching, and Concurrency

Vue.js Performance Hacks for Production Apps

Fine-Tuning AI Models: When and How to Train

Comments are closed

Get Free IT Consultation Today.

+971 5566 87 995

+94 71 194 8814

[email protected]

ABOUT

Quick Links

IT SERVICES

Shopping cart

Scaling Express: Performance, Caching, and Concurrency

From Working to Fast

Profile First

Node.js Built-In Profiler

clinic.js — The Power Tool

What to Look For

Caching

In-Memory Cache (Node-Cache)

Redis Cache — Shared Across Instances

HTTP Cache Headers — Let the CDN Do the Work

Worker Threads

The Problem (Event Loop Blocking)

The Solution (Worker Thread)

Clustering With PM2

PM2 Ecosystem Config

Deploy

Important

Manual Clustering

Rate Limiting

Production Performance Checklist

Explore project snapshots or discuss custom web solutions.

Thank You for Spending Your Valuable Time

I truly appreciate you taking the time to read blog. Your valuable time means a lot to me, and I hope you found the content insightful and engaging!

Frequently Asked Questions

Related Blogs

Scaling Express: Performance, Caching, and Concurrency

Vue.js Performance Hacks for Production Apps

Fine-Tuning AI Models: When and How to Train

Comments are closed

Get Free IT Consultation Today.

+971 5566 87 995

+94 71 194 8814

[email protected]

Never Miss a Blogs

ABOUT

Quick Links

IT SERVICES