Docker & Kubernetes for Warmwind's Architecture¶
Warmwind runs each AI agent session inside an isolated Kubernetes Pod built from a custom Linux distro image. This article covers the entire journey: Dockerfile best practices (multi-stage builds, layer caching, non-root execution, distroless bases), a Docker Compose stack for local development, the conceptual mapping from Docker primitives to Kubernetes objects, Pod lifecycle management for ephemeral agent sessions, Horizontal Pod Autoscaling on custom queue-depth metrics, namespace isolation for multi-tenancy, and health probes wired to NestJS Terminus. If you have built OCI bootstrapping frameworks in bash, every concept here will click -- your scripts are the build pipeline.
Glossary
OCI image -- An Open Container Initiative image; the standardized archive format that Docker, Podman, and Buildah all produce. Your bash bootstrapping frameworks emit these.
Layer -- A filesystem diff stored as a tarball inside an OCI image. Layers are content-addressable and shared across images.
Multi-stage build -- A Dockerfile pattern where earlier stages compile artifacts and later stages copy only the final binary, discarding build tooling.
Distroless -- Google-maintained minimal base images (gcr.io/distroless/*) that contain only the runtime and its dependencies -- no shell, no package manager.
Pod -- The smallest deployable unit in Kubernetes; one or more co-located containers sharing a network namespace and storage volumes.
Deployment -- A Kubernetes controller that declares a desired Pod count, performs rolling updates, and manages ReplicaSets.
Service -- A stable network endpoint (ClusterIP, NodePort, or LoadBalancer) that routes traffic to a set of Pods matched by label selectors.
ConfigMap / Secret -- Kubernetes objects for injecting configuration (plaintext or base64-encoded) into Pods as environment variables or mounted files.
HPA -- Horizontal Pod Autoscaler; a controller that adjusts the replica count of a Deployment based on observed metrics.
Namespace -- A virtual cluster boundary within a Kubernetes cluster, providing resource isolation and RBAC scoping.
Liveness probe -- A periodic health check; if it fails, the kubelet kills the container and restarts it.
Readiness probe -- A periodic check that gates traffic routing; a Pod that fails readiness is removed from Service endpoints.
Startup probe -- A one-time check for slow-starting containers; it disables liveness/readiness probes until the container reports healthy.
1. Dockerfile Best Practices¶
1.1 Multi-Stage Build for NestJS¶
A production NestJS image should carry zero build tooling. Multi-stage builds achieve this by separating the npm install && npm run build stage from the final runtime stage.
# ── Stage 1: install + compile ──────────────────────────────
FROM node:22-bookworm-slim AS builder
WORKDIR /app
# Copy lockfile first -- layer cache survives code changes
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts
COPY tsconfig*.json nest-cli.json ./
COPY src/ src/
RUN npm run build # produces dist/
RUN npm prune --production # strip devDependencies
# ── Stage 2: runtime ────────────────────────────────────────
FROM gcr.io/distroless/nodejs22-debian12:nonroot
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
EXPOSE 3000
CMD ["dist/main.js"]
graph LR
SRC["Source Code<br/>+ package-lock.json"] --> BUILDER["Stage 1: builder<br/>node:22-bookworm-slim"]
BUILDER -->|"npm ci && build"| DIST["dist/ + pruned<br/>node_modules"]
DIST --> RUNTIME["Stage 2: runtime<br/>distroless/nodejs22"]
RUNTIME --> IMAGE["Final OCI Image<br/>~120 MB"]
Why distroless? The final image has no shell, no apt, no curl. An attacker who achieves RCE inside the container cannot install tools, escalate, or pivot. The image also has no /etc/passwd entries beyond nonroot (UID 65534).
Your OCI frameworks map here
If you have written bash that calls buildah from scratch, buildah copy, and
buildah commit to assemble minimal container images, that is exactly what the
multi-stage Dockerfile above automates. The FROM ... AS builder / COPY --from=builder
idiom is the declarative equivalent of your imperative OCI assembly scripts.
The layer-caching semantics (copy lockfile first, then source) are the same
optimization you would apply with buildah's --layers flag.
1.2 Layer Caching Strategy¶
Docker evaluates cache validity top-down. The moment a COPY invalidates a layer, every subsequent instruction rebuilds.
| Instruction order | Cache hit when... |
|---|---|
COPY package-lock.json |
Lock file unchanged (most common case during dev) |
RUN npm ci |
Cached because its input layer is cached |
COPY src/ |
Source changed -- this layer and everything below rebuilds |
RUN npm run build |
Rebuilds (source changed) |
The lock-file-first pattern means a code change triggers only COPY src/ && build, not a full npm ci. On CI this saves 30-90 seconds per build.
1.3 Security Hardening Checklist¶
# 1. Pin exact digest, not just tag
FROM node:22-bookworm-slim@sha256:abc123... AS builder
# 2. Non-root execution
USER nonroot
# 3. Read-only filesystem (set at runtime)
# docker run --read-only --tmpfs /tmp myimage
# 4. No secrets in image layers
# Use BuildKit secret mounts instead:
RUN --mount=type=secret,id=npm_token \
NPM_TOKEN=$(cat /run/secrets/npm_token) npm ci
# 5. Healthcheck for standalone Docker usage
HEALTHCHECK --interval=30s --timeout=3s \
CMD ["/nodejs/bin/node", "-e", "require('http').get('http://localhost:3000/health', r => r.statusCode === 200 ? process.exit(0) : process.exit(1))"]
2. Docker Compose for Local Development¶
2.1 The Warmwind Stack¶
Local development mirrors production: NestJS API, PostgreSQL, Redis (shared by BullMQ and Socket.io adapter), and a BullMQ worker process.
# docker-compose.yml
services:
api:
build:
context: .
target: builder # use the builder stage for hot-reload
command: ["npm", "run", "start:dev"]
ports:
- "3000:3000" # HTTP + GraphQL
- "3001:3001" # WebSocket gateway
volumes:
- ./src:/app/src # bind-mount for hot reload
env_file: .env.local
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
networks:
- warmwind
worker:
build:
context: .
target: builder
command: ["npm", "run", "start:worker"]
env_file: .env.local
depends_on:
- redis
- postgres
networks:
- warmwind
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: warmwind
POSTGRES_PASSWORD: localdev
POSTGRES_DB: warmwind_dev
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U warmwind"]
interval: 5s
timeout: 3s
retries: 5
networks:
- warmwind
redis:
image: redis:7-alpine
command: ["redis-server", "--maxmemory", "256mb", "--maxmemory-policy", "allkeys-lru"]
ports:
- "6379:6379"
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
networks:
- warmwind
volumes:
pgdata:
networks:
warmwind:
driver: bridge
graph LR
DEV["Developer<br/>localhost"] -->|":3000 HTTP<br/>:3001 WS"| API["api<br/>(NestJS)"]
API --> PG[("postgres:16")]
API --> RED["redis:7"]
WORKER["worker<br/>(BullMQ)"] --> RED
WORKER --> PG
RED -->|"PubSub"| API
2.2 Environment File Pattern¶
# .env.local -- loaded by docker-compose env_file
DATABASE_URL=postgresql://warmwind:localdev@postgres:5432/warmwind_dev
REDIS_URL=redis://redis:6379
JWT_SECRET=local-dev-secret-do-not-use-in-production
BULL_QUEUE_PREFIX=warmwind
LOG_LEVEL=debug
Compose services are DNS names
Inside the warmwind Docker network, each service name (postgres, redis,
api) resolves to the container's IP via Docker's embedded DNS. This is
identical to Kubernetes Services -- the DNS-as-service-discovery pattern you
already know from docker network inspect.
3. Kubernetes Concepts Mapped to Docker¶
If you understand Docker Compose, you already understand 80% of Kubernetes conceptually. The table below maps each Compose concept to its Kubernetes equivalent.
| Docker / Compose | Kubernetes | Key Difference |
|---|---|---|
| Container | Container (inside a Pod) | Same OCI runtime concept |
docker run |
Pod | A Pod wraps 1+ containers sharing localhost and volumes |
docker-compose service |
Deployment | Deployment manages ReplicaSets; rolling updates built-in |
ports: mapping |
Service (ClusterIP/NodePort/LoadBalancer) | Service routes to Pods via label selectors, not port mapping |
env_file / environment: |
ConfigMap + Secret | Mounted as files or injected as env vars; Secrets are base64-encoded and can be encrypted at rest |
volumes: (named) |
PersistentVolumeClaim (PVC) | PVC binds to a PersistentVolume backed by cloud storage (EBS, GCE PD, Ceph) |
depends_on: |
Init containers + readiness probes | K8s has no declarative dependency ordering; init containers and probes replace it |
docker-compose up --scale api=3 |
Deployment replicas: 3 | HPA can auto-adjust this number |
| Docker network | Namespace + NetworkPolicy | Namespace isolates DNS; NetworkPolicy controls L3/L4 traffic |
graph LR
subgraph "Docker Compose"
SVC["service: api<br/>ports: 3000"]
VOL["volumes: pgdata"]
ENV["env_file: .env"]
end
subgraph "Kubernetes"
DEP["Deployment<br/>replicas: 3"]
SRV["Service<br/>type: ClusterIP"]
PVC["PersistentVolumeClaim"]
CM["ConfigMap + Secret"]
end
SVC -->|"maps to"| DEP
SVC -->|"ports map to"| SRV
VOL -->|"maps to"| PVC
ENV -->|"maps to"| CM
4. Agent Session Orchestration¶
4.1 Pod-per-Agent Architecture¶
Warmwind spins up one Kubernetes Pod for each AI agent session. The Pod contains the custom Linux distro image with a desktop environment, a VNC server, and a sidecar that tunnels VNC frames over WebSocket to the NestJS backend.
# agent-pod-template.yaml (used as a PodTemplate in a Job or custom controller)
apiVersion: v1
kind: Pod
metadata:
name: agent-${SESSION_ID}
namespace: agents
labels:
app: warmwind-agent
session-id: "${SESSION_ID}"
tenant: "${TENANT_ID}"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
spec:
serviceAccountName: agent-runner
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
# Main agent container: custom distro + desktop + VNC server
- name: agent
image: registry.warmwind.dev/agent-distro:v2.14.0
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
ports:
- containerPort: 5900 # VNC
name: vnc
volumeMounts:
- name: user-data
mountPath: /home/agent/workspace
env:
- name: SESSION_ID
value: "${SESSION_ID}"
- name: AI_MODEL_ENDPOINT
valueFrom:
configMapKeyRef:
name: agent-config
key: model-endpoint
# Sidecar: websockify proxy (VNC TCP -> WebSocket)
- name: vnc-proxy
image: registry.warmwind.dev/websockify:v1.3.0
args: ["--web", "/usr/share/novnc", "6080", "localhost:5900"]
ports:
- containerPort: 6080
name: ws-vnc
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
volumes:
- name: user-data
persistentVolumeClaim:
claimName: agent-pvc-${SESSION_ID}
terminationGracePeriodSeconds: 30
restartPolicy: Never # agent sessions are ephemeral
graph LR
NestJS["NestJS Backend<br/>(K8s API client)"] -->|"kubectl create pod"| APOD["Agent Pod"]
subgraph APOD["Agent Pod (agent-abc123)"]
AGENT["Container: agent<br/>Custom Distro + VNC"] -->|"localhost:5900"| PROXY["Container: vnc-proxy<br/>websockify"]
end
PROXY -->|":6080 WebSocket"| NestJS
APOD --> PVC["PVC: user workspace"]
NestJS -->|"on timeout / user exit"| DELETE["kubectl delete pod"]
4.2 Lifecycle: Create, Run, Terminate¶
The NestJS backend acts as the orchestrator using the @kubernetes/client-node SDK:
import * as k8s from '@kubernetes/client-node';
@Injectable()
export class AgentOrchestrator {
private readonly coreApi: k8s.CoreV1Api;
constructor() {
const kc = new k8s.KubeConfig();
kc.loadFromCluster(); // uses the ServiceAccount mounted in the API pod
this.coreApi = kc.makeApiClient(k8s.CoreV1Api);
}
async createAgentSession(
sessionId: string,
tenantId: string,
resourceTier: 'small' | 'medium' | 'large',
): Promise<k8s.V1Pod> {
const limits = {
small: { cpu: '1', memory: '2Gi' },
medium: { cpu: '2', memory: '4Gi' },
large: { cpu: '4', memory: '8Gi' },
};
const podManifest: k8s.V1Pod = {
metadata: {
name: `agent-${sessionId}`,
namespace: 'agents',
labels: {
app: 'warmwind-agent',
'session-id': sessionId,
tenant: tenantId,
},
},
spec: {
containers: [
{
name: 'agent',
image: 'registry.warmwind.dev/agent-distro:v2.14.0',
resources: {
requests: { cpu: '500m', memory: '1Gi' },
limits: limits[resourceTier],
},
ports: [{ containerPort: 5900, name: 'vnc' }],
},
{
name: 'vnc-proxy',
image: 'registry.warmwind.dev/websockify:v1.3.0',
args: ['6080', 'localhost:5900'],
ports: [{ containerPort: 6080, name: 'ws-vnc' }],
resources: {
requests: { cpu: '100m', memory: '128Mi' },
limits: { cpu: '500m', memory: '256Mi' },
},
},
],
restartPolicy: 'Never',
terminationGracePeriodSeconds: 30,
},
};
const { body } = await this.coreApi.createNamespacedPod({
namespace: 'agents',
body: podManifest,
});
return body;
}
async terminateAgentSession(sessionId: string): Promise<void> {
await this.coreApi.deleteNamespacedPod({
name: `agent-${sessionId}`,
namespace: 'agents',
gracePeriodSeconds: 30,
});
}
}
From docker run to coreApi.createNamespacedPod
If you have written bash scripts that call docker run --cpus=2 --memory=4g
--name agent-$SESSION_ID myimage, the Kubernetes SDK call above is the
equivalent. Resource requests are the soft minimum (the scheduler uses them
to pick a node), and limits are the hard ceiling (the kernel cgroup enforces
them -- same memory.max and cpu.max cgroup knobs your OCI runtime sets).
4.3 Resource Limits in Depth¶
Kubernetes resource management translates directly to Linux cgroups v2:
| K8s Field | cgroup v2 Control | Effect |
|---|---|---|
resources.limits.memory: 4Gi |
memory.max = 4294967296 |
OOM-killed if exceeded |
resources.requests.memory: 1Gi |
memory.min = 1073741824 |
Guaranteed reservation (not reclaimable) |
resources.limits.cpu: 2 |
cpu.max = 200000 100000 |
200ms of CPU per 100ms period = 2 cores |
resources.requests.cpu: 500m |
cpu.weight (proportional) |
Scheduling weight; guaranteed share under contention |
# Verify cgroup limits inside a running agent pod:
kubectl exec agent-abc123 -c agent -- cat /sys/fs/cgroup/memory.max
# 4294967296
kubectl exec agent-abc123 -c agent -- cat /sys/fs/cgroup/cpu.max
# 200000 100000
5. Horizontal Pod Autoscaler¶
5.1 Scaling on Custom Metrics (Queue Depth)¶
The default HPA scales on CPU utilization, but Warmwind needs to scale on BullMQ queue depth -- the number of pending agent-creation requests.
# hpa-agent-workers.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: agent-worker-hpa
namespace: warmwind
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: agent-worker
minReplicas: 2
maxReplicas: 20
behavior:
scaleUp:
stabilizationWindowSeconds: 30 # react fast to queue buildup
policies:
- type: Pods
value: 4
periodSeconds: 60 # add up to 4 pods per minute
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before scaling down
policies:
- type: Percent
value: 25
periodSeconds: 120 # remove 25% every 2 min
metrics:
- type: Pods
pods:
metric:
name: bullmq_waiting_jobs # exposed by the NestJS /metrics endpoint
target:
type: AverageValue
averageValue: "5" # scale up if avg > 5 waiting jobs per worker
graph LR
PROM["Prometheus"] -->|"scrape /metrics"| WORKER["agent-worker Pods"]
PROM -->|"custom.metrics.k8s.io"| ADAPTER["Prometheus Adapter"]
ADAPTER -->|"bullmq_waiting_jobs"| HPA["HPA Controller"]
HPA -->|"scale replicas"| DEPLOY["Deployment:<br/>agent-worker"]
5.2 Exposing the Custom Metric¶
The NestJS worker exposes a Prometheus gauge tracking BullMQ queue depth:
import { Injectable, OnModuleInit } from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
import { Gauge, register } from 'prom-client';
@Injectable()
export class QueueMetrics implements OnModuleInit {
private readonly waitingGauge: Gauge;
constructor(@InjectQueue('agent-sessions') private readonly queue: Queue) {
this.waitingGauge = new Gauge({
name: 'bullmq_waiting_jobs',
help: 'Number of jobs waiting in the agent-sessions queue',
registers: [register],
});
}
async onModuleInit() {
// Poll queue depth every 10 seconds
setInterval(async () => {
const waiting = await this.queue.getWaitingCount();
this.waitingGauge.set(waiting);
}, 10_000);
}
}
6. Namespace Isolation for Multi-Tenancy¶
Warmwind serves multiple customers. Each tenant's agent Pods run in a dedicated namespace with resource quotas and network policies.
# namespace + resource quota
apiVersion: v1
kind: Namespace
metadata:
name: tenant-acme
labels:
tenant: acme
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: agent-quota
namespace: tenant-acme
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "50" # max 50 concurrent agent sessions
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-tenant
namespace: tenant-acme
spec:
podSelector: {} # applies to all pods in this namespace
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: warmwind # only the API namespace
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: tenant-acme # same namespace
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: warmwind # API namespace
- to: # allow DNS
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
graph LR
subgraph "Namespace: warmwind"
API["NestJS API"]
end
subgraph "Namespace: tenant-acme"
A1["Agent Pod 1"]
A2["Agent Pod 2"]
end
subgraph "Namespace: tenant-globex"
G1["Agent Pod 1"]
end
API -->|"allowed"| A1
API -->|"allowed"| G1
A1 -.->|"DENIED"| G1
A2 -.->|"DENIED"| G1
Namespaces are not security boundaries by themselves
Namespaces provide logical isolation and RBAC scoping, but they share the same kernel. If you need hard multi-tenancy (hostile workloads), you need either gVisor/Kata containers (different kernel per pod) or separate clusters. For Warmwind's use case -- trusted agent images authored in-house -- namespace isolation plus NetworkPolicy is sufficient.
7. Health Probes with NestJS Terminus¶
7.1 The Three Probe Types¶
| Probe | Question It Answers | Failure Action | Warmwind Example |
|---|---|---|---|
| Startup | Has the app finished initializing? | Keep waiting (no restart yet) | TypeORM migrations, BullMQ connection |
| Liveness | Is the process alive and not deadlocked? | Kill + restart the container | Event loop blocked > 5s |
| Readiness | Can the app serve traffic right now? | Remove from Service endpoints | PostgreSQL connection lost |
7.2 NestJS Implementation¶
import { Controller, Get } from '@nestjs/common';
import {
HealthCheck,
HealthCheckService,
TypeOrmHealthIndicator,
MemoryHealthIndicator,
MicroserviceHealthIndicator,
} from '@nestjs/terminus';
import { Transport } from '@nestjs/microservices';
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly db: TypeOrmHealthIndicator,
private readonly memory: MemoryHealthIndicator,
private readonly micro: MicroserviceHealthIndicator,
) {}
// Startup probe: called once at boot
// GET /health/startup
@Get('startup')
@HealthCheck()
startup() {
return this.health.check([
() => this.db.pingCheck('database', { timeout: 3000 }),
() =>
this.micro.pingCheck('redis', {
transport: Transport.REDIS,
options: { host: 'redis', port: 6379 },
}),
]);
}
// Liveness probe: is the process healthy?
// GET /health/live
@Get('live')
@HealthCheck()
live() {
return this.health.check([
// OOM guard: fail if RSS > 3.5 GB (limit is 4 GB)
() => this.memory.checkRSS('memory_rss', 3.5 * 1024 * 1024 * 1024),
]);
}
// Readiness probe: can we serve traffic?
// GET /health/ready
@Get('ready')
@HealthCheck()
ready() {
return this.health.check([
() => this.db.pingCheck('database', { timeout: 1500 }),
() =>
this.micro.pingCheck('redis', {
transport: Transport.REDIS,
options: { host: 'redis', port: 6379 },
}),
]);
}
}
7.3 Kubernetes Probe Configuration¶
# In the Deployment spec for the NestJS API:
containers:
- name: api
image: registry.warmwind.dev/api:v3.7.0
ports:
- containerPort: 3000
startupProbe:
httpGet:
path: /health/startup
port: 3000
failureThreshold: 30 # 30 * 10s = 5 min max startup time
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/live
port: 3000
initialDelaySeconds: 0 # startup probe handles the delay
periodSeconds: 15
failureThreshold: 3 # 3 * 15s = 45s before restart
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /health/ready
port: 3000
periodSeconds: 10
failureThreshold: 2 # removed from Service after 20s
timeoutSeconds: 3
Health probes replace depends_on
In Docker Compose, you model service dependencies with depends_on +
condition: service_healthy. Kubernetes has no equivalent declarative
dependency ordering. Instead, the startup probe blocks traffic until the
app is ready, and init containers can wait for upstream services (e.g.,
a busybox init container that loops nc -z postgres 5432). This is
fundamentally the same approach your bash health-check scripts use --
poll until ready, then proceed.
8. Graceful Shutdown¶
When Kubernetes deletes a Pod (rolling update, scale-down, or manual delete), it sends SIGTERM and waits terminationGracePeriodSeconds before SIGKILL. The NestJS app must drain connections and finish in-flight requests.
// main.ts
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
// Enable graceful shutdown hooks (NestJS lifecycle)
app.enableShutdownHooks();
await app.listen(3000);
}
bootstrap();
// graceful-shutdown.service.ts
import { Injectable, OnApplicationShutdown } from '@nestjs/common';
import { InjectQueue } from '@nestjs/bullmq';
import { Queue } from 'bullmq';
import { Logger } from '@nestjs/common';
@Injectable()
export class GracefulShutdownService implements OnApplicationShutdown {
private readonly logger = new Logger(GracefulShutdownService.name);
constructor(
@InjectQueue('agent-sessions') private readonly queue: Queue,
) {}
async onApplicationShutdown(signal?: string): Promise<void> {
this.logger.warn(`Received ${signal} -- starting graceful shutdown`);
// 1. Stop accepting new BullMQ jobs
await this.queue.pause();
this.logger.log('BullMQ queue paused');
// 2. Wait for in-flight jobs (max 25s, leaving 5s buffer)
const deadline = Date.now() + 25_000;
while (Date.now() < deadline) {
const active = await this.queue.getActiveCount();
if (active === 0) break;
this.logger.log(`Waiting for ${active} active jobs...`);
await new Promise((r) => setTimeout(r, 1000));
}
this.logger.log('Graceful shutdown complete');
}
}
graph LR
K8S["Kubernetes<br/>delete pod"] -->|"1. SIGTERM"| NEST["NestJS Process"]
NEST -->|"2. pause queue"| BULL["BullMQ"]
NEST -->|"3. drain HTTP"| HTTP["HTTP Server"]
NEST -->|"4. close WS"| WS["WebSocket Gateway"]
NEST -->|"5. close DB pool"| PG["TypeORM"]
NEST -->|"6. exit 0"| EXIT["Process Exit"]
K8S -->|"7. SIGKILL<br/>(after 30s)"| KILL["Force Kill"]
9. Production Deployment Manifest¶
Putting it all together -- the complete Deployment for the NestJS API:
apiVersion: apps/v1
kind: Deployment
metadata:
name: warmwind-api
namespace: warmwind
labels:
app: warmwind-api
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # zero-downtime deploys
selector:
matchLabels:
app: warmwind-api
template:
metadata:
labels:
app: warmwind-api
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3000"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: warmwind-api
terminationGracePeriodSeconds: 30
containers:
- name: api
image: registry.warmwind.dev/api:v3.7.0
ports:
- containerPort: 3000
name: http
- containerPort: 3001
name: ws
envFrom:
- configMapRef:
name: api-config
- secretRef:
name: api-secrets
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
startupProbe:
httpGet:
path: /health/startup
port: http
failureThreshold: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/live
port: http
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
periodSeconds: 10
failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
name: warmwind-api
namespace: warmwind
spec:
type: ClusterIP
selector:
app: warmwind-api
ports:
- name: http
port: 80
targetPort: http
- name: ws
port: 3001
targetPort: ws
10. Key Takeaways for the Interview¶
| Topic | What to Demonstrate |
|---|---|
| Dockerfile | Multi-stage builds, layer ordering, distroless, non-root, BuildKit secrets |
| Compose | Local dev parity with production; health conditions; bind mounts for DX |
| Docker-to-K8s mapping | Pod = container group, Deployment = service scaling, Service = stable DNS |
| Agent Pods | Pod-per-session, sidecar pattern, resource limits tied to cgroup v2 |
| HPA | Custom metrics via Prometheus Adapter, asymmetric scale-up/down policies |
| Namespaces | Multi-tenancy with ResourceQuota + NetworkPolicy |
| Health probes | Startup/liveness/readiness split, NestJS Terminus integration |
| Graceful shutdown | SIGTERM handling, queue draining, connection closing sequence |
References
- Dockerfile best practices -- Official Docker documentation
- Kubernetes Pod lifecycle -- kubernetes.io
- Horizontal Pod Autoscaler walkthrough -- kubernetes.io
- NestJS Terminus health checks -- docs.nestjs.com
- Distroless container images -- GoogleContainerTools
- Kubernetes Network Policies -- kubernetes.io
- Resource Management for Pods -- kubernetes.io