Scaling & Reliability -- Deep Dive¶
Warmwind must handle spiky workloads -- spinning up containers on demand, relaying VNC streams to hundreds of concurrent browser sessions, and running AI inference jobs that saturate GPU resources. This article covers every layer of the scaling stack: Redis sliding-window rate limiting with real code, BullMQ producer/consumer patterns with retries, Socket.io Redis adapter for horizontal WebSocket scaling, health checks with @nestjs/terminus, and graceful shutdown orchestration with SIGTERM handling.
1. Architecture Overview -- Warmwind Production Topology¶
graph LR
LB["Nginx / Traefik<br/>Load Balancer"] --> N1["NestJS Node 1"]
LB --> N2["NestJS Node 2"]
LB --> N3["NestJS Node 3"]
N1 & N2 & N3 --> RedisQ["Redis (BullMQ Queues)"]
N1 & N2 & N3 --> RedisPS["Redis (PubSub + Socket.io Adapter)"]
N1 & N2 & N3 --> PG[("PostgreSQL<br/>(Primary + Read Replica)")]
RedisQ --> W1["BullMQ Worker:<br/>Container Lifecycle"]
RedisQ --> W2["BullMQ Worker:<br/>AI Inference (GPU)"]
W1 --> Docker["Docker Engine"]
W2 --> GPU["GPU Runtime"]
W1 & W2 -->|"publish events"| RedisPS
| Component | Scaling Strategy |
|---|---|
| NestJS API nodes | Horizontal (stateless, behind LB) |
| WebSocket connections | Redis adapter fans out events across nodes |
| BullMQ workers | Horizontal (each worker pulls from the same Redis queue) |
| PostgreSQL | Primary for writes, read replica for queries |
| Redis | Sentinel or Cluster for HA, separate instances for queues vs PubSub |
Coming from Akka
In Akka Cluster, you scale by adding nodes to the cluster and using Cluster Sharding to distribute actors. In NestJS, there is no cluster protocol -- horizontal scaling is achieved by running multiple stateless Node.js processes behind a load balancer, with Redis as the shared coordination bus. BullMQ workers are the equivalent of Akka persistent actors with at-least-once delivery. Socket.io rooms with a Redis adapter replace Akka's distributed PubSub.
2. Load Balancing¶
| Algorithm | Best For | Warmwind Use Case |
|---|---|---|
| Round Robin | Uniform request cost | REST API endpoints |
| Least Connections | Variable duration requests | VNC WebSocket streams (long-lived) |
| IP Hash | Sticky sessions | Avoid -- stateless APIs should not need sticky sessions |
2.1 Nginx Configuration for WebSocket + HTTP¶
upstream warmwind_api {
least_conn; # Best for mixed short HTTP + long WebSocket
server node1:3000;
server node2:3000;
server node3:3000;
}
server {
listen 443 ssl;
# HTTP API
location /api/ {
proxy_pass http://warmwind_api;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# WebSocket upgrade for Socket.io
location /socket.io/ {
proxy_pass http://warmwind_api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400s; # Keep WebSocket alive for 24h
}
# GraphQL
location /graphql {
proxy_pass http://warmwind_api;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
3. Rate Limiting¶
3.1 Basic: @nestjs/throttler (Token Bucket)¶
Good for single-node or simple cases. Stores counters in memory by default.
import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler';
import { APP_GUARD } from '@nestjs/core';
@Module({
imports: [
ThrottlerModule.forRoot({
throttlers: [
{ name: 'short', ttl: 1_000, limit: 10 }, // 10 req/sec
{ name: 'medium', ttl: 60_000, limit: 100 }, // 100 req/min
{ name: 'long', ttl: 3_600_000, limit: 1_000 }, // 1000 req/hour
],
}),
],
providers: [{ provide: APP_GUARD, useClass: ThrottlerGuard }],
})
export class AppModule {}
// Per-route override for expensive operations
@Throttle({ short: { limit: 2, ttl: 1_000 } })
@Post('containers/start')
async startContainer(@Body() dto: StartContainerDto) {
return this.containerService.start(dto);
}
// Exempt health check from throttling
@SkipThrottle()
@Get('health')
healthCheck() { return { status: 'ok' }; }
3.2 Production: Redis Sliding Window Rate Limiter¶
For multi-node deployments, in-memory rate limiting fails because each node counts independently. A Redis sorted-set sliding window provides distributed, accurate rate limiting.
How it works: Each request adds a timestamped entry to a Redis sorted set keyed by ratelimit:{userId}. Before processing, we remove expired entries (outside the window) and count remaining entries. If the count exceeds the limit, the request is rejected.
import {
Injectable,
CanActivate,
ExecutionContext,
HttpException,
HttpStatus,
Logger,
} from '@nestjs/common';
import { InjectRedis } from '@nestjs-modules/ioredis';
import Redis from 'ioredis';
interface RateLimitConfig {
windowMs: number; // Window size in milliseconds
maxRequests: number; // Maximum requests within the window
keyPrefix: string; // Redis key prefix
}
@Injectable()
export class RedisSlidingWindowGuard implements CanActivate {
private readonly logger = new Logger(RedisSlidingWindowGuard.name);
private readonly config: RateLimitConfig = {
windowMs: 60_000, // 1 minute
maxRequests: 100,
keyPrefix: 'ratelimit',
};
constructor(@InjectRedis() private readonly redis: Redis) {}
async canActivate(context: ExecutionContext): Promise<boolean> {
const request = context.switchToHttp().getRequest();
const userId: string = request.user?.id ?? request.ip;
const key = `${this.config.keyPrefix}:${userId}`;
const now = Date.now();
const windowStart = now - this.config.windowMs;
// Atomic Redis pipeline:
// 1. Remove entries older than the window
// 2. Add current request timestamp
// 3. Count entries in the window
// 4. Set key TTL to auto-expire
const pipeline = this.redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // prune expired
pipeline.zadd(key, now, `${now}:${Math.random()}`); // add current (unique member)
pipeline.zcard(key); // count
pipeline.pexpire(key, this.config.windowMs); // auto-cleanup
const results = await pipeline.exec();
if (!results) {
throw new HttpException(
'Rate limiter unavailable',
HttpStatus.SERVICE_UNAVAILABLE,
);
}
const currentCount = results[2]![1] as number;
// Set rate-limit headers for client visibility
const response = context.switchToHttp().getResponse();
response.setHeader('X-RateLimit-Limit', this.config.maxRequests);
response.setHeader('X-RateLimit-Remaining',
Math.max(0, this.config.maxRequests - currentCount),
);
response.setHeader('X-RateLimit-Reset',
Math.ceil((now + this.config.windowMs) / 1000),
);
if (currentCount > this.config.maxRequests) {
this.logger.warn(`Rate limit exceeded for ${userId}: ${currentCount}/${this.config.maxRequests}`);
throw new HttpException(
{
statusCode: HttpStatus.TOO_MANY_REQUESTS,
message: 'Rate limit exceeded',
retryAfterMs: this.config.windowMs,
},
HttpStatus.TOO_MANY_REQUESTS,
);
}
return true;
}
}
Coming from Akka
Akka HTTP's withRequestTimeout and Play Framework's rate limiting
filters serve the same purpose but run in-process. The Redis sorted-set
approach is the distributed equivalent -- it works identically to a
ZIO RateLimiter backed by Redis, or Akka's Throttle stage if you
imagine the sorted set as the token bucket shared across all stream
materializations.
4. BullMQ Producer/Consumer with Retries¶
4.1 Queue Registration¶
import { Module } from '@nestjs/common';
import { BullModule } from '@nestjs/bullmq';
import { ConfigService } from '@nestjs/config';
@Module({
imports: [
BullModule.forRootAsync({
useFactory: (config: ConfigService) => ({
connection: {
host: config.get('REDIS_HOST', 'localhost'),
port: config.get('REDIS_PORT', 6379),
password: config.get('REDIS_PASSWORD', undefined),
maxRetriesPerRequest: null, // Required by BullMQ
},
}),
inject: [ConfigService],
}),
BullModule.registerQueue(
{ name: 'containers' },
{ name: 'ai-inference' },
{ name: 'cleanup' },
),
],
})
export class QueueModule {}
4.2 Producer: Enqueuing Jobs with Options¶
import { Injectable, Logger } from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
interface InferenceJobData {
userId: string;
containerId: string;
prompt: string;
model: string;
}
@Injectable()
export class InferenceService {
private readonly logger = new Logger(InferenceService.name);
constructor(
@InjectQueue('ai-inference') private readonly queue: Queue,
) {}
async enqueueInference(data: InferenceJobData): Promise<string> {
const job = await this.queue.add('run-inference', data, {
// Retry configuration
attempts: 3,
backoff: {
type: 'exponential', // 2s, 4s, 8s
delay: 2_000,
},
// Job lifecycle
removeOnComplete: { age: 3_600, count: 1_000 }, // keep 1h or 1000 jobs
removeOnFail: { age: 86_400 }, // keep failures 24h
// Priority (lower number = higher priority)
priority: data.model === 'gpt-4' ? 1 : 5,
// Timeout: fail the job if it takes longer than 5 minutes
timeout: 300_000,
// Delay: wait 500ms before processing (debounce rapid submissions)
delay: 500,
});
this.logger.log(`Enqueued inference job ${job.id} for model ${data.model}`);
return job.id!;
}
async getJobStatus(jobId: string) {
const job = await this.queue.getJob(jobId);
if (!job) return null;
return {
id: job.id,
state: await job.getState(),
progress: job.progress,
failedReason: job.failedReason,
result: job.returnvalue,
attemptsMade: job.attemptsMade,
};
}
}
4.3 Consumer: Processing Jobs with Concurrency & Rate Limiting¶
import { Processor, WorkerHost, OnWorkerEvent } from '@nestjs/bullmq';
import { Job, UnrecoverableError } from 'bullmq';
import { Logger } from '@nestjs/common';
@Processor('ai-inference', {
concurrency: 2, // 2 jobs in parallel per worker
limiter: { max: 10, duration: 60_000 }, // max 10 jobs/min (GPU protection)
})
export class InferenceProcessor extends WorkerHost {
private readonly logger = new Logger(InferenceProcessor.name);
constructor(private readonly aiService: AiService) {
super();
}
async process(job: Job<InferenceJobData>): Promise<string> {
this.logger.log(`Processing inference job ${job.id}, attempt ${job.attemptsMade + 1}`);
try {
await job.updateProgress(10);
// Validate model availability before burning GPU time
const modelReady = await this.aiService.isModelLoaded(job.data.model);
if (!modelReady) {
// UnrecoverableError: do NOT retry (model missing = permanent failure)
throw new UnrecoverableError(`Model ${job.data.model} not available`);
}
await job.updateProgress(30);
const result = await this.aiService.runInference(
job.data.prompt,
job.data.model,
);
await job.updateProgress(100);
this.logger.log(`Inference job ${job.id} completed`);
return result;
} catch (err) {
if (err instanceof UnrecoverableError) throw err;
// Transient errors (timeout, OOM) should retry
this.logger.warn(
`Inference job ${job.id} failed (attempt ${job.attemptsMade + 1}): ${err.message}`,
);
throw err; // BullMQ will retry based on job options
}
}
@OnWorkerEvent('completed')
onCompleted(job: Job) {
this.logger.log(`Job ${job.id} completed, result: ${job.returnvalue}`);
}
@OnWorkerEvent('failed')
onFailed(job: Job, err: Error) {
this.logger.error(`Job ${job.id} permanently failed: ${err.message}`);
// Alert on permanent failure (all retries exhausted)
if (job.attemptsMade >= (job.opts.attempts ?? 1)) {
this.alertService.notify('inference-permanent-failure', {
jobId: job.id,
error: err.message,
data: job.data,
});
}
}
}
graph LR
API["NestJS API"] -->|"queue.add('run-inference')"| Q["BullMQ Queue<br/>(Redis)"]
Q -->|"pull job"| W1["Worker 1<br/>concurrency: 2"]
Q -->|"pull job"| W2["Worker 2<br/>concurrency: 2"]
W1 --> GPU1["GPU 0"]
W2 --> GPU2["GPU 1"]
W1 -->|"job completed"| EVT["Redis PubSub"]
W2 -->|"job completed"| EVT
EVT --> API
API -->|"WebSocket push"| Browser["Browser"]
Coming from Akka
BullMQ's concurrency + limiter is the equivalent of Akka Streams'
mapAsync(parallelism) combined with throttle(elements, per).
UnrecoverableError maps to Akka's supervision strategy: you would
use Stop for permanent failures and Restart for transient ones.
BullMQ's backoff: exponential is similar to Akka's
BackoffSupervisor.props with exponential restart delay.
5. Socket.io Redis Adapter -- Horizontal WebSocket Scaling¶
In a single-node deployment, Socket.io rooms work because all clients connect to the same process. With multiple nodes, a client on Node 1 emits to a room, but clients on Node 2 never receive it. The Redis adapter solves this by broadcasting room events through Redis PubSub.
graph LR
C1["Client A"] -->|"connect"| N1["Node 1"]
C2["Client B"] -->|"connect"| N2["Node 2"]
C3["Client C"] -->|"connect"| N1
N1 -->|"room: container:xyz"| Redis["Redis Adapter PubSub"]
N2 -->|"room: container:xyz"| Redis
Redis -->|"fan-out to Node 2"| N2
N2 -->|"emit"| C2
5.1 Custom Redis IO Adapter¶
import { IoAdapter } from '@nestjs/platform-socket.io';
import { ServerOptions } from 'socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';
import { INestApplication, Logger } from '@nestjs/common';
export class RedisIoAdapter extends IoAdapter {
private adapterConstructor: ReturnType<typeof createAdapter>;
private readonly logger = new Logger(RedisIoAdapter.name);
constructor(
app: INestApplication,
private readonly redisUrl: string,
) {
super(app);
}
async connectToRedis(): Promise<void> {
const pubClient = createClient({ url: this.redisUrl });
const subClient = pubClient.duplicate();
await Promise.all([pubClient.connect(), subClient.connect()]);
this.adapterConstructor = createAdapter(pubClient, subClient);
this.logger.log('Redis IO adapter connected');
}
createIOServer(port: number, options?: ServerOptions) {
const server = super.createIOServer(port, {
...options,
transports: ['websocket'], // Skip HTTP long-polling
pingInterval: 25_000,
pingTimeout: 20_000,
maxHttpBufferSize: 1e6, // 1MB max payload
});
server.adapter(this.adapterConstructor);
return server;
}
}
5.2 Wiring in main.ts¶
async function bootstrap() {
const app = await NestFactory.create(AppModule);
const redisUrl = process.env.REDIS_URL ?? 'redis://localhost:6379';
const redisAdapter = new RedisIoAdapter(app, redisUrl);
await redisAdapter.connectToRedis();
app.useWebSocketAdapter(redisAdapter);
app.enableShutdownHooks();
await app.listen(3000);
}
5.3 Emitting to Rooms (Works Across Nodes)¶
// In VncGateway or any service with access to the Server instance
@WebSocketServer() server: Server;
// This emit is automatically broadcast to ALL nodes via Redis adapter.
// Clients in room 'container:abc' on any node receive the event.
this.server.to(`container:${containerId}`).emit('vnc:frame', frameData);
Coming from Akka
Akka Cluster's Distributed PubSub (DistributedPubSub.mediator) serves
the same role as Socket.io's Redis adapter. When you do
mediator ! Publish("topic", msg), Akka routes it to all subscribers
across the cluster. The Redis adapter does exactly this, but using Redis
PubSub as the transport instead of Akka's gossip protocol. The trade-off:
Redis is simpler to operate but adds a network hop; Akka's gossip is
peer-to-peer but requires cluster membership management.
6. Health Checks with @nestjs/terminus¶
Kubernetes (and any orchestrator) needs two probe endpoints:
- Liveness (
/health/live): Is the process alive? If not, restart it. - Readiness (
/health/ready): Can the process serve traffic? If not, remove from load balancer.
6.1 Health Controller¶
import { Controller, Get } from '@nestjs/common';
import {
HealthCheck,
HealthCheckService,
TypeOrmHealthIndicator,
MemoryHealthIndicator,
DiskHealthIndicator,
MicroserviceHealthIndicator,
} from '@nestjs/terminus';
import { InjectRedis } from '@nestjs-modules/ioredis';
import Redis from 'ioredis';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly db: TypeOrmHealthIndicator,
private readonly memory: MemoryHealthIndicator,
private readonly disk: DiskHealthIndicator,
@InjectRedis() private readonly redis: Redis,
) {}
/**
* Liveness: is the process fundamentally healthy?
* Check: memory heap < 500MB, event loop not blocked.
*/
@Get('live')
@HealthCheck()
liveness() {
return this.health.check([
() => this.memory.checkHeap('memory_heap', 500 * 1024 * 1024),
]);
}
/**
* Readiness: can we serve traffic?
* Check: database reachable, Redis reachable, disk space OK.
*/
@Get('ready')
@HealthCheck()
readiness() {
return this.health.check([
() => this.db.pingCheck('database', { timeout: 3_000 }),
() => this.checkRedis(),
() => this.disk.checkStorage('disk', {
path: '/',
thresholdPercent: 0.9, // fail if disk > 90% full
}),
]);
}
private async checkRedis() {
try {
const pong = await this.redis.ping();
if (pong !== 'PONG') throw new Error('Redis ping failed');
return { redis: { status: 'up' } };
} catch (err) {
return { redis: { status: 'down', message: err.message } };
}
}
}
6.2 Kubernetes Probe Configuration¶
# k8s deployment snippet
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2
6.3 Custom Health Indicator for BullMQ¶
import { Injectable } from '@nestjs/common';
import {
HealthIndicator,
HealthIndicatorResult,
HealthCheckError,
} from '@nestjs/terminus';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
@Injectable()
export class BullMQHealthIndicator extends HealthIndicator {
constructor(
@InjectQueue('containers') private readonly containerQueue: Queue,
@InjectQueue('ai-inference') private readonly inferenceQueue: Queue,
) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
const checks = await Promise.all([
this.checkQueue('containers', this.containerQueue),
this.checkQueue('ai-inference', this.inferenceQueue),
]);
const isHealthy = checks.every((c) => c.healthy);
const result = this.getStatus(key, isHealthy, {
queues: checks,
});
if (!isHealthy) {
throw new HealthCheckError('BullMQ unhealthy', result);
}
return result;
}
private async checkQueue(name: string, queue: Queue) {
try {
const waiting = await queue.getWaitingCount();
const active = await queue.getActiveCount();
const failed = await queue.getFailedCount();
return {
name,
healthy: true,
waiting,
active,
failed,
// Alert if too many jobs are stuck waiting
backpressure: waiting > 100,
};
} catch {
return { name, healthy: false };
}
}
}
7. Graceful Shutdown with SIGTERM¶
When Kubernetes sends SIGTERM, the application must:
- Stop accepting new connections (load balancer removes it via failed readiness probe)
- Drain in-flight HTTP requests (finish responses already being processed)
- Pause BullMQ workers (stop pulling new jobs, finish active ones)
- Close WebSocket connections (notify clients to reconnect to another node)
- Close database and Redis connections
- Exit 0 before the
terminationGracePeriodSecondsdeadline (default 30s)
graph LR
K8s["K8s / Orchestrator"] -->|"SIGTERM"| App["NestJS Process"]
App -->|"1. Mark unhealthy"| Health["Readiness -> DOWN"]
App -->|"2. Pause queues"| BullMQ["BullMQ Workers"]
App -->|"3. Drain HTTP"| HTTP["Express Server"]
App -->|"4. Close WS"| WS["Socket.io"]
App -->|"5. Close connections"| DB["PostgreSQL + Redis"]
App -->|"6. exit(0)"| K8s
7.1 Shutdown Service¶
import {
Injectable,
OnModuleDestroy,
OnApplicationShutdown,
Logger,
} from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
import { DataSource } from 'typeorm';
import { InjectRedis } from '@nestjs-modules/ioredis';
import Redis from 'ioredis';
import { Server } from 'socket.io';
@Injectable()
export class GracefulShutdownService
implements OnModuleDestroy, OnApplicationShutdown
{
private readonly logger = new Logger(GracefulShutdownService.name);
private isShuttingDown = false;
constructor(
@InjectQueue('containers') private readonly containerQueue: Queue,
@InjectQueue('ai-inference') private readonly inferenceQueue: Queue,
private readonly dataSource: DataSource,
@InjectRedis() private readonly redis: Redis,
) {}
/**
* Called when the module is being destroyed (before application shutdown).
* Use this for pausing queues and draining active jobs.
*/
async onModuleDestroy(): Promise<void> {
this.isShuttingDown = true;
this.logger.warn('Shutdown initiated -- pausing BullMQ workers...');
// Pause workers: stop pulling new jobs but let active ones finish
await Promise.all([
this.pauseQueue(this.containerQueue, 'containers'),
this.pauseQueue(this.inferenceQueue, 'ai-inference'),
]);
this.logger.warn('All queues paused, active jobs draining...');
}
/**
* Called after all connections are closed.
* Clean up remaining resources.
*/
async onApplicationShutdown(signal?: string): Promise<void> {
this.logger.warn(`Application shutdown (signal: ${signal})`);
// Close database connections
if (this.dataSource.isInitialized) {
await this.dataSource.destroy();
this.logger.log('Database connections closed');
}
// Close Redis
await this.redis.quit();
this.logger.log('Redis connection closed');
this.logger.warn('Shutdown complete');
}
/**
* Health check integration: report as unhealthy during shutdown
* so the load balancer stops sending new requests.
*/
isHealthy(): boolean {
return !this.isShuttingDown;
}
private async pauseQueue(queue: Queue, name: string): Promise<void> {
try {
// Pause the queue (local workers stop pulling)
await queue.pause();
// Wait for currently active jobs to finish (with timeout)
const activeCount = await queue.getActiveCount();
if (activeCount > 0) {
this.logger.warn(
`Queue '${name}': waiting for ${activeCount} active jobs...`,
);
// In BullMQ, pausing the queue stops new job processing.
// Active jobs continue to completion.
// We rely on the K8s grace period (30s) as the hard deadline.
}
this.logger.log(`Queue '${name}' paused`);
} catch (err) {
this.logger.error(`Failed to pause queue '${name}': ${err.message}`);
}
}
}
7.2 main.ts Integration¶
import { NestFactory } from '@nestjs/core';
import { Logger } from '@nestjs/common';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
// CRITICAL: enables onModuleDestroy and onApplicationShutdown hooks
app.enableShutdownHooks();
// WebSocket adapter
const redisAdapter = new RedisIoAdapter(app, process.env.REDIS_URL!);
await redisAdapter.connectToRedis();
app.useWebSocketAdapter(redisAdapter);
// Global pipes, filters, etc.
app.useGlobalPipes(new ValidationPipe({ whitelist: true, transform: true }));
const port = process.env.PORT ?? 3000;
await app.listen(port);
Logger.log(`Warmwind API listening on port ${port}`, 'Bootstrap');
}
bootstrap();
Coming from Akka
Akka's CoordinatedShutdown with its phase-based shutdown sequence
(before-service-unbind, service-unbind, before-cluster-shutdown, etc.)
maps directly to NestJS's lifecycle hooks. onModuleDestroy = Akka's
before-service-unbind phase. onApplicationShutdown = Akka's
before-actor-system-terminate phase. The key difference: Akka
CoordinatedShutdown supports custom phases with dependencies; NestJS
has a fixed two-phase model, so you orchestrate sub-steps manually
within each hook.
8. Putting It All Together -- Request Lifecycle Under Load¶
Trace a single "start container" request through the entire scaled system:
graph LR
Browser -->|"POST /containers/start"| LB["Load Balancer"]
LB -->|"least_conn"| N2["Node 2"]
N2 -->|"RedisSlidingWindowGuard"| RL["Redis: ratelimit:user123"]
RL -->|"under limit"| N2
N2 -->|"queue.add('start')"| BQ["BullMQ Queue (Redis)"]
BQ -->|"pull job"| W1["Worker on Node 1"]
W1 -->|"docker.createContainer"| Docker["Docker Engine"]
W1 -->|"publish 'container.started'"| PubSub["Redis PubSub"]
PubSub -->|"adapter fan-out"| N1["Node 1"]
PubSub -->|"adapter fan-out"| N3["Node 3"]
N3 -->|"room: container:abc"| WS["Browser WebSocket on Node 3"]
- Browser sends POST to load balancer
- Load balancer routes to Node 2 (least connections)
- Rate limiter (Redis sliding window) checks user's request count
- Handler validates input, enqueues BullMQ job, returns
{ jobId }immediately - BullMQ worker on Node 1 pulls the job, creates the Docker container
- Worker publishes
container.startedevent to Redis PubSub - Redis adapter broadcasts to all Socket.io nodes
- Node 3 (where the user's WebSocket lives) emits status update to the browser
What's new (2025--2026)
BullMQ 5 introduces SandboxedProcessor for running jobs in isolated
worker threads with memory limits. Socket.io v4.8+ supports native
WebTransport as an alternative to WebSocket (HTTP/3 based, lower latency).
@nestjs/terminus v11 aligns with NestJS 11 and adds built-in
Prometheus metrics exporter. Redis 8 introduces server-side scripting
improvements that simplify sliding-window rate limiters.
Glossary¶
Glossary
- Token Bucket
- Rate limiter that refills tokens at a fixed rate; requests consume tokens and are rejected when the bucket is empty. Simple but less accurate than sliding window for bursty traffic.
- Sliding Window
- Rate limiter counting requests in a rolling time window using Redis sorted sets. Provides smoother enforcement than fixed windows and prevents the "double burst" problem at window boundaries.
- BullMQ
- Redis-backed job queue library supporting delayed jobs, retries with exponential backoff, concurrency limits, rate limiting, priorities, and job progress tracking. Successor to Bull (now in maintenance mode).
- Worker (BullMQ)
- A process or thread that pulls jobs from a BullMQ queue and executes them. Multiple workers can pull from the same queue for horizontal scaling.
- UnrecoverableError
- BullMQ exception type that signals a permanent failure -- the job will not be retried regardless of the
attemptsconfiguration. - Pub/Sub
- Publish-subscribe pattern where publishers emit events to named channels and all subscribers receive them. Redis PubSub provides cross-node event delivery with at-most-once semantics.
- Redis Adapter (Socket.io)
- Socket.io plugin that uses Redis PubSub to broadcast room events across multiple Node.js processes, enabling horizontal WebSocket scaling.
- Graceful Shutdown
- Stopping a service by ceasing new work (pause queues, fail readiness probe), completing in-flight operations, then closing connections and exiting cleanly.
- Liveness Probe
- Kubernetes health check that determines if a process is alive. Failure triggers a pod restart.
- Readiness Probe
- Kubernetes health check that determines if a process can serve traffic. Failure removes the pod from the load balancer's endpoint list.
- Health Indicator
@nestjs/terminuscomponent that checks a specific dependency (database, Redis, disk, queue) and reports its status for aggregation by theHealthCheckService.- Backpressure
- Condition where a system component cannot keep up with incoming work, causing queues to grow. Detected by monitoring BullMQ waiting counts and managed by rate limiting or scaling workers.
- SIGTERM
- Unix signal sent by orchestrators (Kubernetes, Docker, systemd) to request a graceful process termination. The process has a configurable grace period (default 30s in K8s) before receiving SIGKILL.
- Least Connections
- Load balancing algorithm that routes new requests to the backend with the fewest active connections. Ideal for mixed workloads with variable request durations (short API calls + long WebSocket streams).