Skip to content

Docker & Kubernetes for Warmwind's Architecture

Warmwind runs each AI agent session inside an isolated Kubernetes Pod built from a custom Linux distro image. This article covers the entire journey: Dockerfile best practices (multi-stage builds, layer caching, non-root execution, distroless bases), a Docker Compose stack for local development, the conceptual mapping from Docker primitives to Kubernetes objects, Pod lifecycle management for ephemeral agent sessions, Horizontal Pod Autoscaling on custom queue-depth metrics, namespace isolation for multi-tenancy, and health probes wired to NestJS Terminus. If you have built OCI bootstrapping frameworks in bash, every concept here will click -- your scripts are the build pipeline.


Glossary

OCI image -- An Open Container Initiative image; the standardized archive format that Docker, Podman, and Buildah all produce. Your bash bootstrapping frameworks emit these. Layer -- A filesystem diff stored as a tarball inside an OCI image. Layers are content-addressable and shared across images. Multi-stage build -- A Dockerfile pattern where earlier stages compile artifacts and later stages copy only the final binary, discarding build tooling. Distroless -- Google-maintained minimal base images (gcr.io/distroless/*) that contain only the runtime and its dependencies -- no shell, no package manager. Pod -- The smallest deployable unit in Kubernetes; one or more co-located containers sharing a network namespace and storage volumes. Deployment -- A Kubernetes controller that declares a desired Pod count, performs rolling updates, and manages ReplicaSets. Service -- A stable network endpoint (ClusterIP, NodePort, or LoadBalancer) that routes traffic to a set of Pods matched by label selectors. ConfigMap / Secret -- Kubernetes objects for injecting configuration (plaintext or base64-encoded) into Pods as environment variables or mounted files. HPA -- Horizontal Pod Autoscaler; a controller that adjusts the replica count of a Deployment based on observed metrics. Namespace -- A virtual cluster boundary within a Kubernetes cluster, providing resource isolation and RBAC scoping. Liveness probe -- A periodic health check; if it fails, the kubelet kills the container and restarts it. Readiness probe -- A periodic check that gates traffic routing; a Pod that fails readiness is removed from Service endpoints. Startup probe -- A one-time check for slow-starting containers; it disables liveness/readiness probes until the container reports healthy.


1. Dockerfile Best Practices

1.1 Multi-Stage Build for NestJS

A production NestJS image should carry zero build tooling. Multi-stage builds achieve this by separating the npm install && npm run build stage from the final runtime stage.

# ── Stage 1: install + compile ──────────────────────────────
FROM node:22-bookworm-slim AS builder
WORKDIR /app

# Copy lockfile first -- layer cache survives code changes
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts

COPY tsconfig*.json nest-cli.json ./
COPY src/ src/

RUN npm run build          # produces dist/
RUN npm prune --production # strip devDependencies

# ── Stage 2: runtime ────────────────────────────────────────
FROM gcr.io/distroless/nodejs22-debian12:nonroot
WORKDIR /app

COPY --from=builder /app/dist          ./dist
COPY --from=builder /app/node_modules  ./node_modules
COPY --from=builder /app/package.json  ./

EXPOSE 3000
CMD ["dist/main.js"]
graph LR
    SRC["Source Code<br/>+ package-lock.json"] --> BUILDER["Stage 1: builder<br/>node:22-bookworm-slim"]
    BUILDER -->|"npm ci && build"| DIST["dist/ + pruned<br/>node_modules"]
    DIST --> RUNTIME["Stage 2: runtime<br/>distroless/nodejs22"]
    RUNTIME --> IMAGE["Final OCI Image<br/>~120 MB"]

Why distroless? The final image has no shell, no apt, no curl. An attacker who achieves RCE inside the container cannot install tools, escalate, or pivot. The image also has no /etc/passwd entries beyond nonroot (UID 65534).

Your OCI frameworks map here

If you have written bash that calls buildah from scratch, buildah copy, and buildah commit to assemble minimal container images, that is exactly what the multi-stage Dockerfile above automates. The FROM ... AS builder / COPY --from=builder idiom is the declarative equivalent of your imperative OCI assembly scripts. The layer-caching semantics (copy lockfile first, then source) are the same optimization you would apply with buildah's --layers flag.

1.2 Layer Caching Strategy

Docker evaluates cache validity top-down. The moment a COPY invalidates a layer, every subsequent instruction rebuilds.

Instruction order Cache hit when...
COPY package-lock.json Lock file unchanged (most common case during dev)
RUN npm ci Cached because its input layer is cached
COPY src/ Source changed -- this layer and everything below rebuilds
RUN npm run build Rebuilds (source changed)

The lock-file-first pattern means a code change triggers only COPY src/ && build, not a full npm ci. On CI this saves 30-90 seconds per build.

1.3 Security Hardening Checklist

# 1. Pin exact digest, not just tag
FROM node:22-bookworm-slim@sha256:abc123... AS builder

# 2. Non-root execution
USER nonroot

# 3. Read-only filesystem (set at runtime)
# docker run --read-only --tmpfs /tmp myimage

# 4. No secrets in image layers
# Use BuildKit secret mounts instead:
RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN=$(cat /run/secrets/npm_token) npm ci

# 5. Healthcheck for standalone Docker usage
HEALTHCHECK --interval=30s --timeout=3s \
    CMD ["/nodejs/bin/node", "-e", "require('http').get('http://localhost:3000/health', r => r.statusCode === 200 ? process.exit(0) : process.exit(1))"]

2. Docker Compose for Local Development

2.1 The Warmwind Stack

Local development mirrors production: NestJS API, PostgreSQL, Redis (shared by BullMQ and Socket.io adapter), and a BullMQ worker process.

# docker-compose.yml
services:
  api:
    build:
      context: .
      target: builder        # use the builder stage for hot-reload
    command: ["npm", "run", "start:dev"]
    ports:
      - "3000:3000"          # HTTP + GraphQL
      - "3001:3001"          # WebSocket gateway
    volumes:
      - ./src:/app/src       # bind-mount for hot reload
    env_file: .env.local
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    networks:
      - warmwind

  worker:
    build:
      context: .
      target: builder
    command: ["npm", "run", "start:worker"]
    env_file: .env.local
    depends_on:
      - redis
      - postgres
    networks:
      - warmwind

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: warmwind
      POSTGRES_PASSWORD: localdev
      POSTGRES_DB: warmwind_dev
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U warmwind"]
      interval: 5s
      timeout: 3s
      retries: 5
    networks:
      - warmwind

  redis:
    image: redis:7-alpine
    command: ["redis-server", "--maxmemory", "256mb", "--maxmemory-policy", "allkeys-lru"]
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5
    networks:
      - warmwind

volumes:
  pgdata:

networks:
  warmwind:
    driver: bridge
graph LR
    DEV["Developer<br/>localhost"] -->|":3000 HTTP<br/>:3001 WS"| API["api<br/>(NestJS)"]
    API --> PG[("postgres:16")]
    API --> RED["redis:7"]
    WORKER["worker<br/>(BullMQ)"] --> RED
    WORKER --> PG
    RED -->|"PubSub"| API

2.2 Environment File Pattern

# .env.local -- loaded by docker-compose env_file
DATABASE_URL=postgresql://warmwind:localdev@postgres:5432/warmwind_dev
REDIS_URL=redis://redis:6379
JWT_SECRET=local-dev-secret-do-not-use-in-production
BULL_QUEUE_PREFIX=warmwind
LOG_LEVEL=debug

Compose services are DNS names

Inside the warmwind Docker network, each service name (postgres, redis, api) resolves to the container's IP via Docker's embedded DNS. This is identical to Kubernetes Services -- the DNS-as-service-discovery pattern you already know from docker network inspect.


3. Kubernetes Concepts Mapped to Docker

If you understand Docker Compose, you already understand 80% of Kubernetes conceptually. The table below maps each Compose concept to its Kubernetes equivalent.

Docker / Compose Kubernetes Key Difference
Container Container (inside a Pod) Same OCI runtime concept
docker run Pod A Pod wraps 1+ containers sharing localhost and volumes
docker-compose service Deployment Deployment manages ReplicaSets; rolling updates built-in
ports: mapping Service (ClusterIP/NodePort/LoadBalancer) Service routes to Pods via label selectors, not port mapping
env_file / environment: ConfigMap + Secret Mounted as files or injected as env vars; Secrets are base64-encoded and can be encrypted at rest
volumes: (named) PersistentVolumeClaim (PVC) PVC binds to a PersistentVolume backed by cloud storage (EBS, GCE PD, Ceph)
depends_on: Init containers + readiness probes K8s has no declarative dependency ordering; init containers and probes replace it
docker-compose up --scale api=3 Deployment replicas: 3 HPA can auto-adjust this number
Docker network Namespace + NetworkPolicy Namespace isolates DNS; NetworkPolicy controls L3/L4 traffic
graph LR
    subgraph "Docker Compose"
        SVC["service: api<br/>ports: 3000"]
        VOL["volumes: pgdata"]
        ENV["env_file: .env"]
    end
    subgraph "Kubernetes"
        DEP["Deployment<br/>replicas: 3"]
        SRV["Service<br/>type: ClusterIP"]
        PVC["PersistentVolumeClaim"]
        CM["ConfigMap + Secret"]
    end
    SVC -->|"maps to"| DEP
    SVC -->|"ports map to"| SRV
    VOL -->|"maps to"| PVC
    ENV -->|"maps to"| CM

4. Agent Session Orchestration

4.1 Pod-per-Agent Architecture

Warmwind spins up one Kubernetes Pod for each AI agent session. The Pod contains the custom Linux distro image with a desktop environment, a VNC server, and a sidecar that tunnels VNC frames over WebSocket to the NestJS backend.

# agent-pod-template.yaml (used as a PodTemplate in a Job or custom controller)
apiVersion: v1
kind: Pod
metadata:
  name: agent-${SESSION_ID}
  namespace: agents
  labels:
    app: warmwind-agent
    session-id: "${SESSION_ID}"
    tenant: "${TENANT_ID}"
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
spec:
  serviceAccountName: agent-runner
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
    # Main agent container: custom distro + desktop + VNC server
    - name: agent
      image: registry.warmwind.dev/agent-distro:v2.14.0
      resources:
        requests:
          cpu: "500m"
          memory: "1Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
      ports:
        - containerPort: 5900  # VNC
          name: vnc
      volumeMounts:
        - name: user-data
          mountPath: /home/agent/workspace
      env:
        - name: SESSION_ID
          value: "${SESSION_ID}"
        - name: AI_MODEL_ENDPOINT
          valueFrom:
            configMapKeyRef:
              name: agent-config
              key: model-endpoint

    # Sidecar: websockify proxy (VNC TCP -> WebSocket)
    - name: vnc-proxy
      image: registry.warmwind.dev/websockify:v1.3.0
      args: ["--web", "/usr/share/novnc", "6080", "localhost:5900"]
      ports:
        - containerPort: 6080
          name: ws-vnc
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"
        limits:
          cpu: "500m"
          memory: "256Mi"

  volumes:
    - name: user-data
      persistentVolumeClaim:
        claimName: agent-pvc-${SESSION_ID}

  terminationGracePeriodSeconds: 30
  restartPolicy: Never  # agent sessions are ephemeral
graph LR
    NestJS["NestJS Backend<br/>(K8s API client)"] -->|"kubectl create pod"| APOD["Agent Pod"]
    subgraph APOD["Agent Pod (agent-abc123)"]
        AGENT["Container: agent<br/>Custom Distro + VNC"] -->|"localhost:5900"| PROXY["Container: vnc-proxy<br/>websockify"]
    end
    PROXY -->|":6080 WebSocket"| NestJS
    APOD --> PVC["PVC: user workspace"]
    NestJS -->|"on timeout / user exit"| DELETE["kubectl delete pod"]

4.2 Lifecycle: Create, Run, Terminate

The NestJS backend acts as the orchestrator using the @kubernetes/client-node SDK:

import * as k8s from '@kubernetes/client-node';

@Injectable()
export class AgentOrchestrator {
  private readonly coreApi: k8s.CoreV1Api;

  constructor() {
    const kc = new k8s.KubeConfig();
    kc.loadFromCluster(); // uses the ServiceAccount mounted in the API pod
    this.coreApi = kc.makeApiClient(k8s.CoreV1Api);
  }

  async createAgentSession(
    sessionId: string,
    tenantId: string,
    resourceTier: 'small' | 'medium' | 'large',
  ): Promise<k8s.V1Pod> {
    const limits = {
      small:  { cpu: '1',  memory: '2Gi' },
      medium: { cpu: '2',  memory: '4Gi' },
      large:  { cpu: '4',  memory: '8Gi' },
    };

    const podManifest: k8s.V1Pod = {
      metadata: {
        name: `agent-${sessionId}`,
        namespace: 'agents',
        labels: {
          app: 'warmwind-agent',
          'session-id': sessionId,
          tenant: tenantId,
        },
      },
      spec: {
        containers: [
          {
            name: 'agent',
            image: 'registry.warmwind.dev/agent-distro:v2.14.0',
            resources: {
              requests: { cpu: '500m', memory: '1Gi' },
              limits: limits[resourceTier],
            },
            ports: [{ containerPort: 5900, name: 'vnc' }],
          },
          {
            name: 'vnc-proxy',
            image: 'registry.warmwind.dev/websockify:v1.3.0',
            args: ['6080', 'localhost:5900'],
            ports: [{ containerPort: 6080, name: 'ws-vnc' }],
            resources: {
              requests: { cpu: '100m', memory: '128Mi' },
              limits: { cpu: '500m', memory: '256Mi' },
            },
          },
        ],
        restartPolicy: 'Never',
        terminationGracePeriodSeconds: 30,
      },
    };

    const { body } = await this.coreApi.createNamespacedPod({
      namespace: 'agents',
      body: podManifest,
    });
    return body;
  }

  async terminateAgentSession(sessionId: string): Promise<void> {
    await this.coreApi.deleteNamespacedPod({
      name: `agent-${sessionId}`,
      namespace: 'agents',
      gracePeriodSeconds: 30,
    });
  }
}

From docker run to coreApi.createNamespacedPod

If you have written bash scripts that call docker run --cpus=2 --memory=4g --name agent-$SESSION_ID myimage, the Kubernetes SDK call above is the equivalent. Resource requests are the soft minimum (the scheduler uses them to pick a node), and limits are the hard ceiling (the kernel cgroup enforces them -- same memory.max and cpu.max cgroup knobs your OCI runtime sets).

4.3 Resource Limits in Depth

Kubernetes resource management translates directly to Linux cgroups v2:

K8s Field cgroup v2 Control Effect
resources.limits.memory: 4Gi memory.max = 4294967296 OOM-killed if exceeded
resources.requests.memory: 1Gi memory.min = 1073741824 Guaranteed reservation (not reclaimable)
resources.limits.cpu: 2 cpu.max = 200000 100000 200ms of CPU per 100ms period = 2 cores
resources.requests.cpu: 500m cpu.weight (proportional) Scheduling weight; guaranteed share under contention
# Verify cgroup limits inside a running agent pod:
kubectl exec agent-abc123 -c agent -- cat /sys/fs/cgroup/memory.max
# 4294967296

kubectl exec agent-abc123 -c agent -- cat /sys/fs/cgroup/cpu.max
# 200000 100000

5. Horizontal Pod Autoscaler

5.1 Scaling on Custom Metrics (Queue Depth)

The default HPA scales on CPU utilization, but Warmwind needs to scale on BullMQ queue depth -- the number of pending agent-creation requests.

# hpa-agent-workers.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: agent-worker-hpa
  namespace: warmwind
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: agent-worker
  minReplicas: 2
  maxReplicas: 20
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 30    # react fast to queue buildup
      policies:
        - type: Pods
          value: 4
          periodSeconds: 60             # add up to 4 pods per minute
    scaleDown:
      stabilizationWindowSeconds: 300   # wait 5 min before scaling down
      policies:
        - type: Percent
          value: 25
          periodSeconds: 120            # remove 25% every 2 min
  metrics:
    - type: Pods
      pods:
        metric:
          name: bullmq_waiting_jobs     # exposed by the NestJS /metrics endpoint
        target:
          type: AverageValue
          averageValue: "5"             # scale up if avg > 5 waiting jobs per worker
graph LR
    PROM["Prometheus"] -->|"scrape /metrics"| WORKER["agent-worker Pods"]
    PROM -->|"custom.metrics.k8s.io"| ADAPTER["Prometheus Adapter"]
    ADAPTER -->|"bullmq_waiting_jobs"| HPA["HPA Controller"]
    HPA -->|"scale replicas"| DEPLOY["Deployment:<br/>agent-worker"]

5.2 Exposing the Custom Metric

The NestJS worker exposes a Prometheus gauge tracking BullMQ queue depth:

import { Injectable, OnModuleInit } from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
import { Gauge, register } from 'prom-client';

@Injectable()
export class QueueMetrics implements OnModuleInit {
  private readonly waitingGauge: Gauge;

  constructor(@InjectQueue('agent-sessions') private readonly queue: Queue) {
    this.waitingGauge = new Gauge({
      name: 'bullmq_waiting_jobs',
      help: 'Number of jobs waiting in the agent-sessions queue',
      registers: [register],
    });
  }

  async onModuleInit() {
    // Poll queue depth every 10 seconds
    setInterval(async () => {
      const waiting = await this.queue.getWaitingCount();
      this.waitingGauge.set(waiting);
    }, 10_000);
  }
}

6. Namespace Isolation for Multi-Tenancy

Warmwind serves multiple customers. Each tenant's agent Pods run in a dedicated namespace with resource quotas and network policies.

# namespace + resource quota
apiVersion: v1
kind: Namespace
metadata:
  name: tenant-acme
  labels:
    tenant: acme
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: agent-quota
  namespace: tenant-acme
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "50"          # max 50 concurrent agent sessions
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-tenant
  namespace: tenant-acme
spec:
  podSelector: {}       # applies to all pods in this namespace
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: warmwind   # only the API namespace
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: tenant-acme  # same namespace
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: warmwind      # API namespace
    - to:   # allow DNS
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
graph LR
    subgraph "Namespace: warmwind"
        API["NestJS API"]
    end
    subgraph "Namespace: tenant-acme"
        A1["Agent Pod 1"]
        A2["Agent Pod 2"]
    end
    subgraph "Namespace: tenant-globex"
        G1["Agent Pod 1"]
    end
    API -->|"allowed"| A1
    API -->|"allowed"| G1
    A1 -.->|"DENIED"| G1
    A2 -.->|"DENIED"| G1

Namespaces are not security boundaries by themselves

Namespaces provide logical isolation and RBAC scoping, but they share the same kernel. If you need hard multi-tenancy (hostile workloads), you need either gVisor/Kata containers (different kernel per pod) or separate clusters. For Warmwind's use case -- trusted agent images authored in-house -- namespace isolation plus NetworkPolicy is sufficient.


7. Health Probes with NestJS Terminus

7.1 The Three Probe Types

Probe Question It Answers Failure Action Warmwind Example
Startup Has the app finished initializing? Keep waiting (no restart yet) TypeORM migrations, BullMQ connection
Liveness Is the process alive and not deadlocked? Kill + restart the container Event loop blocked > 5s
Readiness Can the app serve traffic right now? Remove from Service endpoints PostgreSQL connection lost

7.2 NestJS Implementation

import { Controller, Get } from '@nestjs/common';
import {
  HealthCheck,
  HealthCheckService,
  TypeOrmHealthIndicator,
  MemoryHealthIndicator,
  MicroserviceHealthIndicator,
} from '@nestjs/terminus';
import { Transport } from '@nestjs/microservices';

@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly db: TypeOrmHealthIndicator,
    private readonly memory: MemoryHealthIndicator,
    private readonly micro: MicroserviceHealthIndicator,
  ) {}

  // Startup probe: called once at boot
  // GET /health/startup
  @Get('startup')
  @HealthCheck()
  startup() {
    return this.health.check([
      () => this.db.pingCheck('database', { timeout: 3000 }),
      () =>
        this.micro.pingCheck('redis', {
          transport: Transport.REDIS,
          options: { host: 'redis', port: 6379 },
        }),
    ]);
  }

  // Liveness probe: is the process healthy?
  // GET /health/live
  @Get('live')
  @HealthCheck()
  live() {
    return this.health.check([
      // OOM guard: fail if RSS > 3.5 GB (limit is 4 GB)
      () => this.memory.checkRSS('memory_rss', 3.5 * 1024 * 1024 * 1024),
    ]);
  }

  // Readiness probe: can we serve traffic?
  // GET /health/ready
  @Get('ready')
  @HealthCheck()
  ready() {
    return this.health.check([
      () => this.db.pingCheck('database', { timeout: 1500 }),
      () =>
        this.micro.pingCheck('redis', {
          transport: Transport.REDIS,
          options: { host: 'redis', port: 6379 },
        }),
    ]);
  }
}

7.3 Kubernetes Probe Configuration

# In the Deployment spec for the NestJS API:
containers:
  - name: api
    image: registry.warmwind.dev/api:v3.7.0
    ports:
      - containerPort: 3000
    startupProbe:
      httpGet:
        path: /health/startup
        port: 3000
      failureThreshold: 30     # 30 * 10s = 5 min max startup time
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /health/live
        port: 3000
      initialDelaySeconds: 0   # startup probe handles the delay
      periodSeconds: 15
      failureThreshold: 3      # 3 * 15s = 45s before restart
      timeoutSeconds: 5
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 3000
      periodSeconds: 10
      failureThreshold: 2      # removed from Service after 20s
      timeoutSeconds: 3

Health probes replace depends_on

In Docker Compose, you model service dependencies with depends_on + condition: service_healthy. Kubernetes has no equivalent declarative dependency ordering. Instead, the startup probe blocks traffic until the app is ready, and init containers can wait for upstream services (e.g., a busybox init container that loops nc -z postgres 5432). This is fundamentally the same approach your bash health-check scripts use -- poll until ready, then proceed.


8. Graceful Shutdown

When Kubernetes deletes a Pod (rolling update, scale-down, or manual delete), it sends SIGTERM and waits terminationGracePeriodSeconds before SIGKILL. The NestJS app must drain connections and finish in-flight requests.

// main.ts
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  // Enable graceful shutdown hooks (NestJS lifecycle)
  app.enableShutdownHooks();

  await app.listen(3000);
}
bootstrap();
// graceful-shutdown.service.ts
import { Injectable, OnApplicationShutdown } from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
import { Logger } from '@nestjs/common';

@Injectable()
export class GracefulShutdownService implements OnApplicationShutdown {
  private readonly logger = new Logger(GracefulShutdownService.name);

  constructor(
    @InjectQueue('agent-sessions') private readonly queue: Queue,
  ) {}

  async onApplicationShutdown(signal?: string): Promise<void> {
    this.logger.warn(`Received ${signal} -- starting graceful shutdown`);

    // 1. Stop accepting new BullMQ jobs
    await this.queue.pause();
    this.logger.log('BullMQ queue paused');

    // 2. Wait for in-flight jobs (max 25s, leaving 5s buffer)
    const deadline = Date.now() + 25_000;
    while (Date.now() < deadline) {
      const active = await this.queue.getActiveCount();
      if (active === 0) break;
      this.logger.log(`Waiting for ${active} active jobs...`);
      await new Promise((r) => setTimeout(r, 1000));
    }

    this.logger.log('Graceful shutdown complete');
  }
}
graph LR
    K8S["Kubernetes<br/>delete pod"] -->|"1. SIGTERM"| NEST["NestJS Process"]
    NEST -->|"2. pause queue"| BULL["BullMQ"]
    NEST -->|"3. drain HTTP"| HTTP["HTTP Server"]
    NEST -->|"4. close WS"| WS["WebSocket Gateway"]
    NEST -->|"5. close DB pool"| PG["TypeORM"]
    NEST -->|"6. exit 0"| EXIT["Process Exit"]
    K8S -->|"7. SIGKILL<br/>(after 30s)"| KILL["Force Kill"]

9. Production Deployment Manifest

Putting it all together -- the complete Deployment for the NestJS API:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: warmwind-api
  namespace: warmwind
  labels:
    app: warmwind-api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0     # zero-downtime deploys
  selector:
    matchLabels:
      app: warmwind-api
  template:
    metadata:
      labels:
        app: warmwind-api
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3000"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: warmwind-api
      terminationGracePeriodSeconds: 30
      containers:
        - name: api
          image: registry.warmwind.dev/api:v3.7.0
          ports:
            - containerPort: 3000
              name: http
            - containerPort: 3001
              name: ws
          envFrom:
            - configMapRef:
                name: api-config
            - secretRef:
                name: api-secrets
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "2Gi"
          startupProbe:
            httpGet:
              path: /health/startup
              port: http
            failureThreshold: 30
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            periodSeconds: 15
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            periodSeconds: 10
            failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
  name: warmwind-api
  namespace: warmwind
spec:
  type: ClusterIP
  selector:
    app: warmwind-api
  ports:
    - name: http
      port: 80
      targetPort: http
    - name: ws
      port: 3001
      targetPort: ws

10. Key Takeaways for the Interview

Topic What to Demonstrate
Dockerfile Multi-stage builds, layer ordering, distroless, non-root, BuildKit secrets
Compose Local dev parity with production; health conditions; bind mounts for DX
Docker-to-K8s mapping Pod = container group, Deployment = service scaling, Service = stable DNS
Agent Pods Pod-per-session, sidecar pattern, resource limits tied to cgroup v2
HPA Custom metrics via Prometheus Adapter, asymmetric scale-up/down policies
Namespaces Multi-tenancy with ResourceQuota + NetworkPolicy
Health probes Startup/liveness/readiness split, NestJS Terminus integration
Graceful shutdown SIGTERM handling, queue draining, connection closing sequence

References