Express in 2027: AI APIs, SSE Streaming & What’s Next

Front
Back
Right
Left
Top
Bottom
AI-ASSISTED
AI-Assisted APIs and What's Next

Express in 2027

"The best API is the one that users don't feel is slow."
A principle that matters even more when you're streaming tokens from a language model.

In 2024, “AI-powered backend” meant calling an LLM and waiting. In 2027, it means streaming, real-time, and event-driven. Express — despite its age — is fully capable of handling all of it. Let me show you how, and then give you an honest look at whether it’s still the right tool for your stack.

SSE

Streaming Responses with Server-Sent Events (SSE)

Why SSE for AI?

Streaming LLM responses is one of the most common challenges developers face when building AI-powered applications. The difference between a sluggish app that makes users wait 10+ seconds for a complete response and a responsive interface that starts showing content immediately can mean the difference between user adoption and abandonment.

Server-Sent Events (SSE) is the de facto standard for LLM streaming — it’s what OpenAI, Anthropic, and most LLM APIs use natively. SSE is a web standard that allows servers to push data to clients over a single, long-lived HTTP connection. Unlike WebSockets, SSE is: simpler to implement, proxy-friendly, and automatically reconnects on drop.
Copy to clipboard
event: message
data: {"content": "Hello"}

event: message
data: {"content": " world"}

event: done
data: [DONE]

Setting up SSE in Express

Copy to clipboard
// routes/stream.js
app.get('/api/stream', (req, res) => {
  // Required SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.setHeader('X-Accel-Buffering', 'no'); // Disable nginx buffering
  res.flushHeaders();

  let count = 0;
  const interval = setInterval(() => {
    res.write(`data: ${JSON.stringify({ tick: count++ })}\n\n`);
    if (count >= 5) {
      res.write('data: [DONE]\n\n');
      res.end();
      clearInterval(interval);
    }
  }, 500);

  req.on('close', () => clearInterval(interval));
});
LLM-POWERED

Building an LLM-Powered API Endpoint

With the Anthropic SDK

Copy to clipboard
// routes/ai.js
import express from 'express';
import Anthropic from '@anthropic-ai/sdk';

const router = express.Router();
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

router.post('/chat/stream', async (req, res) => {
  const { message } = req.body;

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.flushHeaders();

  try {
    const stream = await anthropic.messages.stream({
      model: 'claude-opus-4-6',
      max_tokens: 1024,
      messages: [{ role: 'user', content: message }],
    });

    for await (const event of stream) {
      if (event.type === 'content_block_delta') {
        res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
      }
    }

    res.write('data: [DONE]\n\n');
    res.end();
  } catch (err) {
    res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
    res.end();
  }
});

export default router;

With the OpenAI SDK

Copy to clipboard
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

router.post('/chat/stream', async (req, res) => {
  const { messages } = req.body;

  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.flushHeaders();

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }

  res.write('data: [DONE]\n\n');
  res.end();
});
Whether you’re building with OpenAI, Anthropic, or any other provider, the SSE pattern above is universally applicable.
WEBSOCKET

WebSockets Alongside REST in the Same Express Server

Sometimes SSE isn’t enough — you need true bidirectional communication (think collaborative editing, live cursors, or real-time game state). You can add WebSockets to your Express server without a separate process:
Copy to clipboard
import express from 'express';
import http from 'http';
import { WebSocketServer } from 'ws';

const app = express();
const server = http.createServer(app);
const wss = new WebSocketServer({ server }); // Share the same HTTP server

// REST routes still work
app.get('/api/health', (req, res) => res.json({ status: 'ok' }));

// WebSocket handler
wss.on('connection', (ws) => {
  console.log('Client connected via WebSocket');

  ws.on('message', async (msg) => {
    const { type, payload } = JSON.parse(msg.toString());

    if (type === 'ping') {
      ws.send(JSON.stringify({ type: 'pong', payload }));
    }
  });

  ws.on('close', () => console.log('Client disconnected'));
});

server.listen(3000);
Rule of thumb
Use SSE when the server pushes data (AI tokens, notifications, live scores). Use WebSockets when the client also sends events mid-stream (collaborative tools, real-time games).
CLOUDFLARE

Can Express Run on Cloudflare Workers (Edge)?

Short answer: Not natively. Cloudflare Workers use a V8 isolate runtime that doesn’t support Node.js APIs like `http`, `net`, or `process` in the traditional sense. Express’s router is built on top of these.

What you can do
For most APIs, running Express behind Cloudflare’s CDN gives you the edge performance benefits (caching, DDoS protection) without rewriting your entire application.
HONO vs FASTIFY
Honest Decision Guide

Hono vs Fastify vs Express in 2027

This is the question I get asked most. Here’s my honest take:
Express 5 Fastify Hono
Ecosystem maturity ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
Raw performance Good Excellent (~2×) Excellent (edge)
TypeScript support Good Excellent Excellent
Learning curve Near-zero Low Low
Edge runtime No No Yes
Middleware ecosystem Massive Growing Small but growing
Best for Most APIs High-throughput services Edge / multi-runtime

Should you migrate away from Express?

If your Express app is running well: **probably not**. The ecosystem, middleware, and institutional knowledge built around Express are enormous assets. The performance difference between Express and Fastify rarely matters at the infrastructure level — a single extra database query dwarfs any framework overhead.
Migrate if
Stay on Express if:

Explore project snapshots or discuss custom web solutions.

SSE is underrated: many developers jump straight to WebSockets, but SSE is easier to implement, debug, and scale when your use case is server-only streaming.

How We Used SSE to Stream LLM Responses at Scale, Dani Akabani - 2025

Thank You for Spending Your Valuable Time

I truly appreciate you taking the time to read blog. Your valuable time means a lot to me, and I hope you found the content insightful and engaging!
Front
Back
Right
Left
Top
Bottom
FAQ's

Frequently Asked Questions

Yes, but they're not designed for streaming. Use the native `EventSource` API for SSE, or the `fetch` streaming API (`response.body.getReader()`) for more control. Libraries like `@microsoft/fetch-event-source` give you SSE with POST support and auto-reconnect.

SSE is better for token streaming (server → client). WebSockets are better if you also need to send events mid-stream — like cancelling a generation or sending audio chunks back. Most AI chat interfaces use SSE.

Listen to `req.on('close', ...)` and clean up your streams/intervals. The browser's `EventSource` will auto-reconnect — you don't need to handle that on the server.

Never expose your API key via a browser-facing endpoint. Always proxy through your Express backend. Add rate limiting (e.g., `express-rate-limit`) and authentication (JWT/API key) to your `/api/chat/stream` route.

Absolutely. Just instantiate both clients and route to the appropriate one based on request params. Many production apps do this for model fallback or A/B testing.

Express 5 brought async error handling and dropped legacy callbacks. The ecosystem isn't going anywhere — it's too deeply embedded. But the interesting innovation is happening in Hono and Bun-native runtimes. The smart play is: understand Express deeply, keep an eye on Hono.

Comments are closed