Scaling & Reliability -- Deep Dive¶

Warmwind must handle spiky workloads -- spinning up containers on demand, relaying VNC streams to hundreds of concurrent browser sessions, and running AI inference jobs that saturate GPU resources. This article covers every layer of the scaling stack: Redis sliding-window rate limiting with real code, BullMQ producer/consumer patterns with retries, Socket.io Redis adapter for horizontal WebSocket scaling, health checks with @nestjs/terminus, and graceful shutdown orchestration with SIGTERM handling.

1. Architecture Overview -- Warmwind Production Topology¶

graph LR
    LB["Nginx / Traefik<br/>Load Balancer"] --> N1["NestJS Node 1"]
    LB --> N2["NestJS Node 2"]
    LB --> N3["NestJS Node 3"]
    N1 & N2 & N3 --> RedisQ["Redis (BullMQ Queues)"]
    N1 & N2 & N3 --> RedisPS["Redis (PubSub + Socket.io Adapter)"]
    N1 & N2 & N3 --> PG[("PostgreSQL<br/>(Primary + Read Replica)")]
    RedisQ --> W1["BullMQ Worker:<br/>Container Lifecycle"]
    RedisQ --> W2["BullMQ Worker:<br/>AI Inference (GPU)"]
    W1 --> Docker["Docker Engine"]
    W2 --> GPU["GPU Runtime"]
    W1 & W2 -->|"publish events"| RedisPS

Component	Scaling Strategy
NestJS API nodes	Horizontal (stateless, behind LB)
WebSocket connections	Redis adapter fans out events across nodes
BullMQ workers	Horizontal (each worker pulls from the same Redis queue)
PostgreSQL	Primary for writes, read replica for queries
Redis	Sentinel or Cluster for HA, separate instances for queues vs PubSub

Coming from Akka

In Akka Cluster, you scale by adding nodes to the cluster and using Cluster Sharding to distribute actors. In NestJS, there is no cluster protocol -- horizontal scaling is achieved by running multiple stateless Node.js processes behind a load balancer, with Redis as the shared coordination bus. BullMQ workers are the equivalent of Akka persistent actors with at-least-once delivery. Socket.io rooms with a Redis adapter replace Akka's distributed PubSub.

2. Load Balancing¶

Algorithm	Best For	Warmwind Use Case
Round Robin	Uniform request cost	REST API endpoints
Least Connections	Variable duration requests	VNC WebSocket streams (long-lived)
IP Hash	Sticky sessions	Avoid -- stateless APIs should not need sticky sessions

2.1 Nginx Configuration for WebSocket + HTTP¶

upstream warmwind_api {
    least_conn;  # Best for mixed short HTTP + long WebSocket
    server node1:3000;
    server node2:3000;
    server node3:3000;
}

server {
    listen 443 ssl;

    # HTTP API
    location /api/ {
        proxy_pass http://warmwind_api;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

    # WebSocket upgrade for Socket.io
    location /socket.io/ {
        proxy_pass http://warmwind_api;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_read_timeout 86400s;  # Keep WebSocket alive for 24h
    }

    # GraphQL
    location /graphql {
        proxy_pass http://warmwind_api;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

3. Rate Limiting¶

3.1 Basic: @nestjs/throttler (Token Bucket)¶

Good for single-node or simple cases. Stores counters in memory by default.

import { ThrottlerModule, ThrottlerGuard } from '@nestjs/throttler';
import { APP_GUARD } from '@nestjs/core';

@Module({
  imports: [
    ThrottlerModule.forRoot({
      throttlers: [
        { name: 'short', ttl: 1_000, limit: 10 },   // 10 req/sec
        { name: 'medium', ttl: 60_000, limit: 100 }, // 100 req/min
        { name: 'long', ttl: 3_600_000, limit: 1_000 }, // 1000 req/hour
      ],
    }),
  ],
  providers: [{ provide: APP_GUARD, useClass: ThrottlerGuard }],
})
export class AppModule {}

// Per-route override for expensive operations
@Throttle({ short: { limit: 2, ttl: 1_000 } })
@Post('containers/start')
async startContainer(@Body() dto: StartContainerDto) {
  return this.containerService.start(dto);
}

// Exempt health check from throttling
@SkipThrottle()
@Get('health')
healthCheck() { return { status: 'ok' }; }

3.2 Production: Redis Sliding Window Rate Limiter¶

For multi-node deployments, in-memory rate limiting fails because each node counts independently. A Redis sorted-set sliding window provides distributed, accurate rate limiting.

How it works: Each request adds a timestamped entry to a Redis sorted set keyed by ratelimit:{userId}. Before processing, we remove expired entries (outside the window) and count remaining entries. If the count exceeds the limit, the request is rejected.

import {
  Injectable,
  CanActivate,
  ExecutionContext,
  HttpException,
  HttpStatus,
  Logger,
} from '@nestjs/common';
import { InjectRedis } from '@nestjs-modules/ioredis';
import Redis from 'ioredis';

interface RateLimitConfig {
  windowMs: number;   // Window size in milliseconds
  maxRequests: number; // Maximum requests within the window
  keyPrefix: string;   // Redis key prefix
}

@Injectable()
export class RedisSlidingWindowGuard implements CanActivate {
  private readonly logger = new Logger(RedisSlidingWindowGuard.name);

  private readonly config: RateLimitConfig = {
    windowMs: 60_000,    // 1 minute
    maxRequests: 100,
    keyPrefix: 'ratelimit',
  };

  constructor(@InjectRedis() private readonly redis: Redis) {}

  async canActivate(context: ExecutionContext): Promise<boolean> {
    const request = context.switchToHttp().getRequest();
    const userId: string = request.user?.id ?? request.ip;
    const key = `${this.config.keyPrefix}:${userId}`;
    const now = Date.now();
    const windowStart = now - this.config.windowMs;

    // Atomic Redis pipeline:
    // 1. Remove entries older than the window
    // 2. Add current request timestamp
    // 3. Count entries in the window
    // 4. Set key TTL to auto-expire
    const pipeline = this.redis.pipeline();
    pipeline.zremrangebyscore(key, 0, windowStart);      // prune expired
    pipeline.zadd(key, now, `${now}:${Math.random()}`);  // add current (unique member)
    pipeline.zcard(key);                                  // count
    pipeline.pexpire(key, this.config.windowMs);          // auto-cleanup

    const results = await pipeline.exec();
    if (!results) {
      throw new HttpException(
        'Rate limiter unavailable',
        HttpStatus.SERVICE_UNAVAILABLE,
      );
    }

    const currentCount = results[2]![1] as number;

    // Set rate-limit headers for client visibility
    const response = context.switchToHttp().getResponse();
    response.setHeader('X-RateLimit-Limit', this.config.maxRequests);
    response.setHeader('X-RateLimit-Remaining',
      Math.max(0, this.config.maxRequests - currentCount),
    );
    response.setHeader('X-RateLimit-Reset',
      Math.ceil((now + this.config.windowMs) / 1000),
    );

    if (currentCount > this.config.maxRequests) {
      this.logger.warn(`Rate limit exceeded for ${userId}: ${currentCount}/${this.config.maxRequests}`);
      throw new HttpException(
        {
          statusCode: HttpStatus.TOO_MANY_REQUESTS,
          message: 'Rate limit exceeded',
          retryAfterMs: this.config.windowMs,
        },
        HttpStatus.TOO_MANY_REQUESTS,
      );
    }

    return true;
  }
}

Coming from Akka

Akka HTTP's withRequestTimeout and Play Framework's rate limiting filters serve the same purpose but run in-process. The Redis sorted-set approach is the distributed equivalent -- it works identically to a ZIO RateLimiter backed by Redis, or Akka's Throttle stage if you imagine the sorted set as the token bucket shared across all stream materializations.

4. BullMQ Producer/Consumer with Retries¶

4.1 Queue Registration¶

import { Module } from '@nestjs/common';
import { BullModule } from '@nestjs/bullmq';
import { ConfigService } from '@nestjs/config';

@Module({
  imports: [
    BullModule.forRootAsync({
      useFactory: (config: ConfigService) => ({
        connection: {
          host: config.get('REDIS_HOST', 'localhost'),
          port: config.get('REDIS_PORT', 6379),
          password: config.get('REDIS_PASSWORD', undefined),
          maxRetriesPerRequest: null,  // Required by BullMQ
        },
      }),
      inject: [ConfigService],
    }),
    BullModule.registerQueue(
      { name: 'containers' },
      { name: 'ai-inference' },
      { name: 'cleanup' },
    ),
  ],
})
export class QueueModule {}

4.2 Producer: Enqueuing Jobs with Options¶

import { Injectable, Logger } from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';

interface InferenceJobData {
  userId: string;
  containerId: string;
  prompt: string;
  model: string;
}

@Injectable()
export class InferenceService {
  private readonly logger = new Logger(InferenceService.name);

  constructor(
    @InjectQueue('ai-inference') private readonly queue: Queue,
  ) {}

  async enqueueInference(data: InferenceJobData): Promise<string> {
    const job = await this.queue.add('run-inference', data, {
      // Retry configuration
      attempts: 3,
      backoff: {
        type: 'exponential',  // 2s, 4s, 8s
        delay: 2_000,
      },

      // Job lifecycle
      removeOnComplete: { age: 3_600, count: 1_000 }, // keep 1h or 1000 jobs
      removeOnFail: { age: 86_400 },                   // keep failures 24h

      // Priority (lower number = higher priority)
      priority: data.model === 'gpt-4' ? 1 : 5,

      // Timeout: fail the job if it takes longer than 5 minutes
      timeout: 300_000,

      // Delay: wait 500ms before processing (debounce rapid submissions)
      delay: 500,
    });

    this.logger.log(`Enqueued inference job ${job.id} for model ${data.model}`);
    return job.id!;
  }

  async getJobStatus(jobId: string) {
    const job = await this.queue.getJob(jobId);
    if (!job) return null;

    return {
      id: job.id,
      state: await job.getState(),
      progress: job.progress,
      failedReason: job.failedReason,
      result: job.returnvalue,
      attemptsMade: job.attemptsMade,
    };
  }
}

4.3 Consumer: Processing Jobs with Concurrency & Rate Limiting¶

import { Processor, WorkerHost, OnWorkerEvent } from '@nestjs/bullmq';
import { Job, UnrecoverableError } from 'bullmq';
import { Logger } from '@nestjs/common';

@Processor('ai-inference', {
  concurrency: 2,                                // 2 jobs in parallel per worker
  limiter: { max: 10, duration: 60_000 },        // max 10 jobs/min (GPU protection)
})
export class InferenceProcessor extends WorkerHost {
  private readonly logger = new Logger(InferenceProcessor.name);

  constructor(private readonly aiService: AiService) {
    super();
  }

  async process(job: Job<InferenceJobData>): Promise<string> {
    this.logger.log(`Processing inference job ${job.id}, attempt ${job.attemptsMade + 1}`);

    try {
      await job.updateProgress(10);

      // Validate model availability before burning GPU time
      const modelReady = await this.aiService.isModelLoaded(job.data.model);
      if (!modelReady) {
        // UnrecoverableError: do NOT retry (model missing = permanent failure)
        throw new UnrecoverableError(`Model ${job.data.model} not available`);
      }

      await job.updateProgress(30);
      const result = await this.aiService.runInference(
        job.data.prompt,
        job.data.model,
      );

      await job.updateProgress(100);
      this.logger.log(`Inference job ${job.id} completed`);

      return result;
    } catch (err) {
      if (err instanceof UnrecoverableError) throw err;

      // Transient errors (timeout, OOM) should retry
      this.logger.warn(
        `Inference job ${job.id} failed (attempt ${job.attemptsMade + 1}): ${err.message}`,
      );
      throw err;  // BullMQ will retry based on job options
    }
  }

  @OnWorkerEvent('completed')
  onCompleted(job: Job) {
    this.logger.log(`Job ${job.id} completed, result: ${job.returnvalue}`);
  }

  @OnWorkerEvent('failed')
  onFailed(job: Job, err: Error) {
    this.logger.error(`Job ${job.id} permanently failed: ${err.message}`);
    // Alert on permanent failure (all retries exhausted)
    if (job.attemptsMade >= (job.opts.attempts ?? 1)) {
      this.alertService.notify('inference-permanent-failure', {
        jobId: job.id,
        error: err.message,
        data: job.data,
      });
    }
  }
}

graph LR
    API["NestJS API"] -->|"queue.add('run-inference')"| Q["BullMQ Queue<br/>(Redis)"]
    Q -->|"pull job"| W1["Worker 1<br/>concurrency: 2"]
    Q -->|"pull job"| W2["Worker 2<br/>concurrency: 2"]
    W1 --> GPU1["GPU 0"]
    W2 --> GPU2["GPU 1"]
    W1 -->|"job completed"| EVT["Redis PubSub"]
    W2 -->|"job completed"| EVT
    EVT --> API
    API -->|"WebSocket push"| Browser["Browser"]

Coming from Akka

BullMQ's concurrency + limiter is the equivalent of Akka Streams' mapAsync(parallelism) combined with throttle(elements, per). UnrecoverableError maps to Akka's supervision strategy: you would use Stop for permanent failures and Restart for transient ones. BullMQ's backoff: exponential is similar to Akka's BackoffSupervisor.props with exponential restart delay.

5. Socket.io Redis Adapter -- Horizontal WebSocket Scaling¶

In a single-node deployment, Socket.io rooms work because all clients connect to the same process. With multiple nodes, a client on Node 1 emits to a room, but clients on Node 2 never receive it. The Redis adapter solves this by broadcasting room events through Redis PubSub.

graph LR
    C1["Client A"] -->|"connect"| N1["Node 1"]
    C2["Client B"] -->|"connect"| N2["Node 2"]
    C3["Client C"] -->|"connect"| N1
    N1 -->|"room: container:xyz"| Redis["Redis Adapter PubSub"]
    N2 -->|"room: container:xyz"| Redis
    Redis -->|"fan-out to Node 2"| N2
    N2 -->|"emit"| C2

5.1 Custom Redis IO Adapter¶

import { IoAdapter } from '@nestjs/platform-socket.io';
import { ServerOptions } from 'socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';
import { INestApplication, Logger } from '@nestjs/common';

export class RedisIoAdapter extends IoAdapter {
  private adapterConstructor: ReturnType<typeof createAdapter>;
  private readonly logger = new Logger(RedisIoAdapter.name);

  constructor(
    app: INestApplication,
    private readonly redisUrl: string,
  ) {
    super(app);
  }

  async connectToRedis(): Promise<void> {
    const pubClient = createClient({ url: this.redisUrl });
    const subClient = pubClient.duplicate();

    await Promise.all([pubClient.connect(), subClient.connect()]);

    this.adapterConstructor = createAdapter(pubClient, subClient);
    this.logger.log('Redis IO adapter connected');
  }

  createIOServer(port: number, options?: ServerOptions) {
    const server = super.createIOServer(port, {
      ...options,
      transports: ['websocket'],  // Skip HTTP long-polling
      pingInterval: 25_000,
      pingTimeout: 20_000,
      maxHttpBufferSize: 1e6,     // 1MB max payload
    });

    server.adapter(this.adapterConstructor);
    return server;
  }
}

5.2 Wiring in main.ts¶

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  const redisUrl = process.env.REDIS_URL ?? 'redis://localhost:6379';
  const redisAdapter = new RedisIoAdapter(app, redisUrl);
  await redisAdapter.connectToRedis();
  app.useWebSocketAdapter(redisAdapter);

  app.enableShutdownHooks();
  await app.listen(3000);
}

5.3 Emitting to Rooms (Works Across Nodes)¶

// In VncGateway or any service with access to the Server instance
@WebSocketServer() server: Server;

// This emit is automatically broadcast to ALL nodes via Redis adapter.
// Clients in room 'container:abc' on any node receive the event.
this.server.to(`container:${containerId}`).emit('vnc:frame', frameData);

Coming from Akka

Akka Cluster's Distributed PubSub (DistributedPubSub.mediator) serves the same role as Socket.io's Redis adapter. When you do mediator ! Publish("topic", msg), Akka routes it to all subscribers across the cluster. The Redis adapter does exactly this, but using Redis PubSub as the transport instead of Akka's gossip protocol. The trade-off: Redis is simpler to operate but adds a network hop; Akka's gossip is peer-to-peer but requires cluster membership management.

6. Health Checks with @nestjs/terminus¶

Kubernetes (and any orchestrator) needs two probe endpoints:

Liveness (/health/live): Is the process alive? If not, restart it.
Readiness (/health/ready): Can the process serve traffic? If not, remove from load balancer.

6.1 Health Controller¶

import { Controller, Get } from '@nestjs/common';
import {
  HealthCheck,
  HealthCheckService,
  TypeOrmHealthIndicator,
  MemoryHealthIndicator,
  DiskHealthIndicator,
  MicroserviceHealthIndicator,
} from '@nestjs/terminus';
import { InjectRedis } from '@nestjs-modules/ioredis';
import Redis from 'ioredis';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly db: TypeOrmHealthIndicator,
    private readonly memory: MemoryHealthIndicator,
    private readonly disk: DiskHealthIndicator,
    @InjectRedis() private readonly redis: Redis,
  ) {}

  /**
   * Liveness: is the process fundamentally healthy?
   * Check: memory heap < 500MB, event loop not blocked.
   */
  @Get('live')
  @HealthCheck()
  liveness() {
    return this.health.check([
      () => this.memory.checkHeap('memory_heap', 500 * 1024 * 1024),
    ]);
  }

  /**
   * Readiness: can we serve traffic?
   * Check: database reachable, Redis reachable, disk space OK.
   */
  @Get('ready')
  @HealthCheck()
  readiness() {
    return this.health.check([
      () => this.db.pingCheck('database', { timeout: 3_000 }),
      () => this.checkRedis(),
      () => this.disk.checkStorage('disk', {
        path: '/',
        thresholdPercent: 0.9,  // fail if disk > 90% full
      }),
    ]);
  }

  private async checkRedis() {
    try {
      const pong = await this.redis.ping();
      if (pong !== 'PONG') throw new Error('Redis ping failed');
      return { redis: { status: 'up' } };
    } catch (err) {
      return { redis: { status: 'down', message: err.message } };
    }
  }
}

6.2 Kubernetes Probe Configuration¶

# k8s deployment snippet
livenessProbe:
  httpGet:
    path: /health/live
    port: 3000
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 2

6.3 Custom Health Indicator for BullMQ¶

import { Injectable } from '@nestjs/common';
import {
  HealthIndicator,
  HealthIndicatorResult,
  HealthCheckError,
} from '@nestjs/terminus';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';

@Injectable()
export class BullMQHealthIndicator extends HealthIndicator {
  constructor(
    @InjectQueue('containers') private readonly containerQueue: Queue,
    @InjectQueue('ai-inference') private readonly inferenceQueue: Queue,
  ) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    const checks = await Promise.all([
      this.checkQueue('containers', this.containerQueue),
      this.checkQueue('ai-inference', this.inferenceQueue),
    ]);

    const isHealthy = checks.every((c) => c.healthy);
    const result = this.getStatus(key, isHealthy, {
      queues: checks,
    });

    if (!isHealthy) {
      throw new HealthCheckError('BullMQ unhealthy', result);
    }

    return result;
  }

  private async checkQueue(name: string, queue: Queue) {
    try {
      const waiting = await queue.getWaitingCount();
      const active = await queue.getActiveCount();
      const failed = await queue.getFailedCount();

      return {
        name,
        healthy: true,
        waiting,
        active,
        failed,
        // Alert if too many jobs are stuck waiting
        backpressure: waiting > 100,
      };
    } catch {
      return { name, healthy: false };
    }
  }
}

7. Graceful Shutdown with SIGTERM¶

When Kubernetes sends SIGTERM, the application must:

Stop accepting new connections (load balancer removes it via failed readiness probe)
Drain in-flight HTTP requests (finish responses already being processed)
Pause BullMQ workers (stop pulling new jobs, finish active ones)
Close WebSocket connections (notify clients to reconnect to another node)
Close database and Redis connections
Exit 0 before the terminationGracePeriodSeconds deadline (default 30s)

graph LR
    K8s["K8s / Orchestrator"] -->|"SIGTERM"| App["NestJS Process"]
    App -->|"1. Mark unhealthy"| Health["Readiness -> DOWN"]
    App -->|"2. Pause queues"| BullMQ["BullMQ Workers"]
    App -->|"3. Drain HTTP"| HTTP["Express Server"]
    App -->|"4. Close WS"| WS["Socket.io"]
    App -->|"5. Close connections"| DB["PostgreSQL + Redis"]
    App -->|"6. exit(0)"| K8s

7.1 Shutdown Service¶

import {
  Injectable,
  OnModuleDestroy,
  OnApplicationShutdown,
  Logger,
} from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
import { DataSource } from 'typeorm';
import { InjectRedis } from '@nestjs-modules/ioredis';
import Redis from 'ioredis';
import { Server } from 'socket.io';

@Injectable()
export class GracefulShutdownService
  implements OnModuleDestroy, OnApplicationShutdown
{
  private readonly logger = new Logger(GracefulShutdownService.name);
  private isShuttingDown = false;

  constructor(
    @InjectQueue('containers') private readonly containerQueue: Queue,
    @InjectQueue('ai-inference') private readonly inferenceQueue: Queue,
    private readonly dataSource: DataSource,
    @InjectRedis() private readonly redis: Redis,
  ) {}

  /**
   * Called when the module is being destroyed (before application shutdown).
   * Use this for pausing queues and draining active jobs.
   */
  async onModuleDestroy(): Promise<void> {
    this.isShuttingDown = true;
    this.logger.warn('Shutdown initiated -- pausing BullMQ workers...');

    // Pause workers: stop pulling new jobs but let active ones finish
    await Promise.all([
      this.pauseQueue(this.containerQueue, 'containers'),
      this.pauseQueue(this.inferenceQueue, 'ai-inference'),
    ]);

    this.logger.warn('All queues paused, active jobs draining...');
  }

  /**
   * Called after all connections are closed.
   * Clean up remaining resources.
   */
  async onApplicationShutdown(signal?: string): Promise<void> {
    this.logger.warn(`Application shutdown (signal: ${signal})`);

    // Close database connections
    if (this.dataSource.isInitialized) {
      await this.dataSource.destroy();
      this.logger.log('Database connections closed');
    }

    // Close Redis
    await this.redis.quit();
    this.logger.log('Redis connection closed');

    this.logger.warn('Shutdown complete');
  }

  /**
   * Health check integration: report as unhealthy during shutdown
   * so the load balancer stops sending new requests.
   */
  isHealthy(): boolean {
    return !this.isShuttingDown;
  }

  private async pauseQueue(queue: Queue, name: string): Promise<void> {
    try {
      // Pause the queue (local workers stop pulling)
      await queue.pause();

      // Wait for currently active jobs to finish (with timeout)
      const activeCount = await queue.getActiveCount();
      if (activeCount > 0) {
        this.logger.warn(
          `Queue '${name}': waiting for ${activeCount} active jobs...`,
        );
        // In BullMQ, pausing the queue stops new job processing.
        // Active jobs continue to completion.
        // We rely on the K8s grace period (30s) as the hard deadline.
      }

      this.logger.log(`Queue '${name}' paused`);
    } catch (err) {
      this.logger.error(`Failed to pause queue '${name}': ${err.message}`);
    }
  }
}

7.2 main.ts Integration¶

import { NestFactory } from '@nestjs/core';
import { Logger } from '@nestjs/common';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  // CRITICAL: enables onModuleDestroy and onApplicationShutdown hooks
  app.enableShutdownHooks();

  // WebSocket adapter
  const redisAdapter = new RedisIoAdapter(app, process.env.REDIS_URL!);
  await redisAdapter.connectToRedis();
  app.useWebSocketAdapter(redisAdapter);

  // Global pipes, filters, etc.
  app.useGlobalPipes(new ValidationPipe({ whitelist: true, transform: true }));

  const port = process.env.PORT ?? 3000;
  await app.listen(port);
  Logger.log(`Warmwind API listening on port ${port}`, 'Bootstrap');
}
bootstrap();

Coming from Akka

Akka's CoordinatedShutdown with its phase-based shutdown sequence (before-service-unbind, service-unbind, before-cluster-shutdown, etc.) maps directly to NestJS's lifecycle hooks. onModuleDestroy = Akka's before-service-unbind phase. onApplicationShutdown = Akka's before-actor-system-terminate phase. The key difference: Akka CoordinatedShutdown supports custom phases with dependencies; NestJS has a fixed two-phase model, so you orchestrate sub-steps manually within each hook.

8. Putting It All Together -- Request Lifecycle Under Load¶

Trace a single "start container" request through the entire scaled system:

graph LR
    Browser -->|"POST /containers/start"| LB["Load Balancer"]
    LB -->|"least_conn"| N2["Node 2"]
    N2 -->|"RedisSlidingWindowGuard"| RL["Redis: ratelimit:user123"]
    RL -->|"under limit"| N2
    N2 -->|"queue.add('start')"| BQ["BullMQ Queue (Redis)"]
    BQ -->|"pull job"| W1["Worker on Node 1"]
    W1 -->|"docker.createContainer"| Docker["Docker Engine"]
    W1 -->|"publish 'container.started'"| PubSub["Redis PubSub"]
    PubSub -->|"adapter fan-out"| N1["Node 1"]
    PubSub -->|"adapter fan-out"| N3["Node 3"]
    N3 -->|"room: container:abc"| WS["Browser WebSocket on Node 3"]

Browser sends POST to load balancer
Load balancer routes to Node 2 (least connections)
Rate limiter (Redis sliding window) checks user's request count
Handler validates input, enqueues BullMQ job, returns { jobId } immediately
BullMQ worker on Node 1 pulls the job, creates the Docker container
Worker publishes container.started event to Redis PubSub
Redis adapter broadcasts to all Socket.io nodes
Node 3 (where the user's WebSocket lives) emits status update to the browser

What's new (2025--2026)

BullMQ 5 introduces SandboxedProcessor for running jobs in isolated worker threads with memory limits. Socket.io v4.8+ supports native WebTransport as an alternative to WebSocket (HTTP/3 based, lower latency). @nestjs/terminus v11 aligns with NestJS 11 and adds built-in Prometheus metrics exporter. Redis 8 introduces server-side scripting improvements that simplify sliding-window rate limiters.

Glossary¶

Glossary

Token Bucket: Rate limiter that refills tokens at a fixed rate; requests consume tokens and are rejected when the bucket is empty. Simple but less accurate than sliding window for bursty traffic.
Sliding Window: Rate limiter counting requests in a rolling time window using Redis sorted sets. Provides smoother enforcement than fixed windows and prevents the "double burst" problem at window boundaries.
BullMQ: Redis-backed job queue library supporting delayed jobs, retries with exponential backoff, concurrency limits, rate limiting, priorities, and job progress tracking. Successor to Bull (now in maintenance mode).
Worker (BullMQ): A process or thread that pulls jobs from a BullMQ queue and executes them. Multiple workers can pull from the same queue for horizontal scaling.
UnrecoverableError: BullMQ exception type that signals a permanent failure -- the job will not be retried regardless of the attempts configuration.
Pub/Sub: Publish-subscribe pattern where publishers emit events to named channels and all subscribers receive them. Redis PubSub provides cross-node event delivery with at-most-once semantics.
Redis Adapter (Socket.io): Socket.io plugin that uses Redis PubSub to broadcast room events across multiple Node.js processes, enabling horizontal WebSocket scaling.
Graceful Shutdown: Stopping a service by ceasing new work (pause queues, fail readiness probe), completing in-flight operations, then closing connections and exiting cleanly.
Liveness Probe: Kubernetes health check that determines if a process is alive. Failure triggers a pod restart.
Readiness Probe: Kubernetes health check that determines if a process can serve traffic. Failure removes the pod from the load balancer's endpoint list.
Health Indicator: @nestjs/terminus component that checks a specific dependency (database, Redis, disk, queue) and reports its status for aggregation by the HealthCheckService.
Backpressure: Condition where a system component cannot keep up with incoming work, causing queues to grow. Detected by monitoring BullMQ waiting counts and managed by rate limiting or scaling workers.
SIGTERM: Unix signal sent by orchestrators (Kubernetes, Docker, systemd) to request a graceful process termination. The process has a configurable grace period (default 30s in K8s) before receiving SIGKILL.
Least Connections: Load balancing algorithm that routes new requests to the backend with the fewest active connections. Ideal for mixed workloads with variable request durations (short API calls + long WebSocket streams).