WebSocket Protocol & Real-Time Architecture¶
Warmwind streams live VNC desktop frames from agent Pods to browsers over WebSocket connections. This article covers the protocol from byte level to production architecture: the HTTP upgrade handshake, binary frame format with opcodes and masking, ping/pong keepalives, and close negotiation. Then it maps these fundamentals to NestJS's @WebSocketGateway, compares Socket.io with raw WebSocket, details how VNC-over-WebSocket works (noVNC/websockify, binary frames, Canvas rendering), and addresses the hard problems of horizontal scaling (Redis adapter, sticky sessions), backpressure management, connection resilience, binary protocol design for multiplexed agent commands, and performance benchmarks.
Glossary
WebSocket -- A full-duplex, persistent communication protocol defined in RFC 6455. Runs over a single TCP connection after an HTTP upgrade handshake.
Frame -- The basic unit of WebSocket data transmission. Each frame has an opcode (text, binary, ping, pong, close), mask bit, payload length, and payload data.
Opcode -- A 4-bit field in the frame header identifying the frame type: 0x1 = text, 0x2 = binary, 0x8 = close, 0x9 = ping, 0xA = pong.
Masking -- Client-to-server frames MUST be masked with a 32-bit key (XOR). Server-to-client frames MUST NOT be masked. This prevents cache-poisoning attacks on transparent proxies.
Socket.io -- A library built on top of WebSocket that adds automatic reconnection, room/namespace multiplexing, binary support, and HTTP long-polling fallback.
Redis Adapter -- A Socket.io adapter that uses Redis Pub/Sub to broadcast events across multiple NestJS instances, enabling horizontal scaling.
Backpressure -- The condition where a producer (server sending VNC frames) outpaces a consumer (browser rendering). Without management, buffers grow until memory is exhausted.
noVNC -- An open-source JavaScript VNC client that runs entirely in the browser, using WebSocket for transport and HTML5 Canvas for rendering.
websockify -- A WebSocket-to-TCP proxy that bridges noVNC (WebSocket) to a standard VNC server (TCP/5900). Part of the noVNC project.
RFB -- Remote Framebuffer protocol; the wire protocol used by VNC. Defines how screen rectangles, pixel formats, and input events are encoded.
Sticky session -- A load-balancer feature that routes all requests from a given client to the same backend server. Required by Socket.io's HTTP polling fallback but not by pure WebSocket.
Gateway -- In NestJS, a class decorated with @WebSocketGateway() that manages WebSocket connections, message subscriptions, and lifecycle hooks.
1. The WebSocket Protocol¶
1.1 HTTP Upgrade Handshake¶
Every WebSocket connection begins as an HTTP/1.1 request with the Upgrade: websocket header. The server responds with 101 Switching Protocols and the connection transitions from HTTP to a persistent bidirectional binary channel.
GET /vnc?sessionId=abc123 HTTP/1.1
Host: stream.warmwind.dev
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: binary
Origin: https://app.warmwind.dev
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: binary
The Sec-WebSocket-Accept value is computed as:
graph LR
C["Browser"] -->|"1. HTTP GET + Upgrade"| S["NestJS Server"]
S -->|"2. 101 Switching Protocols"| C
C <-->|"3. Full-duplex<br/>binary frames"| S
You already know this pattern
The HTTP upgrade is conceptually identical to how SSH starts: the client connects on TCP, exchanges capabilities in plaintext, then transitions to an encrypted binary protocol. The difference is that WebSocket reuses the existing HTTP port (80/443) so it traverses corporate proxies and CDNs without special configuration.
1.2 Frame Format¶
After the upgrade, all data flows as WebSocket frames. The binary wire format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data (continued) |
+---------------------------------------------------------------+
| Field | Bits | Purpose |
|---|---|---|
| FIN | 1 | 1 = this is the final fragment of a message |
| RSV1-3 | 3 | Reserved for extensions (e.g., permessage-deflate uses RSV1) |
| Opcode | 4 | 0x0 continuation, 0x1 text, 0x2 binary, 0x8 close, 0x9 ping, 0xA pong |
| MASK | 1 | 1 = payload is masked (client-to-server always; server-to-client never) |
| Payload length | 7+16/64 | 0-125 = literal length; 126 = next 2 bytes are length; 127 = next 8 bytes |
| Masking key | 32 | 4-byte XOR key; decoded[i] = encoded[i] XOR mask[i % 4] |
1.3 Ping/Pong Keepalive¶
WebSocket defines control frames for connection health monitoring:
- Ping (opcode
0x9): sent by either side. The receiver MUST respond with a Pong containing the same payload. - Pong (opcode
0xA): the response to a Ping. Unsolicited Pongs are allowed (used as unidirectional heartbeats).
// Server-side ping interval (NestJS / ws library)
import WebSocket from 'ws';
const wss = new WebSocket.Server({ port: 3001 });
wss.on('connection', (ws: WebSocket) => {
(ws as any).isAlive = true;
ws.on('pong', () => {
(ws as any).isAlive = true;
});
});
// Every 30 seconds, ping all clients; terminate those that did not pong
const heartbeat = setInterval(() => {
wss.clients.forEach((ws) => {
if ((ws as any).isAlive === false) {
ws.terminate(); // dead connection
return;
}
(ws as any).isAlive = false;
ws.ping(); // expect pong within 30s
});
}, 30_000);
wss.on('close', () => clearInterval(heartbeat));
1.4 Close Handshake¶
Either side can initiate a close by sending a Close frame (opcode 0x8) with a 2-byte status code and optional reason:
| Code | Meaning | When Warmwind Uses It |
|---|---|---|
1000 |
Normal closure | User ends session |
1001 |
Going away | Server shutting down (SIGTERM) |
1008 |
Policy violation | Auth token expired |
1011 |
Unexpected condition | Unhandled server error |
4001 |
(custom) Agent terminated | Agent pod deleted |
4002 |
(custom) Session timeout | Idle timeout exceeded |
2. Socket.io vs Raw WebSocket¶
| Feature | Raw WebSocket (ws) | Socket.io |
|---|---|---|
| Protocol overhead | 2-14 bytes per frame | ~50-200 bytes (JSON envelope) |
| Reconnection | Manual | Automatic with exponential backoff |
| Multiplexing | Manual (single channel) | Namespaces + rooms built-in |
| Binary support | Native (opcode 0x2) |
Supported (auto-detected) |
| Fallback | None (WebSocket or nothing) | HTTP long-polling fallback |
| Broadcasting | Manual iteration | io.to(room).emit() |
| Redis scaling | Manual Pub/Sub | @socket.io/redis-adapter drop-in |
| Best for | High-throughput binary streams (VNC) | Event-driven app logic (chat, notifications) |
Warmwind's approach: Use raw WebSocket (ws library) for VNC binary streaming where every byte of overhead matters. Use Socket.io for agent control events (status updates, AI responses, user input) where developer ergonomics and reconnection matter more than raw throughput.
graph LR
BROWSER["Browser Client"]
BROWSER -->|"Socket.io<br/>(JSON events)"| SIO["NestJS Socket.io Gateway<br/>:3001 /events"]
BROWSER -->|"Raw WebSocket<br/>(binary VNC frames)"| WS["NestJS WS Gateway<br/>:3001 /vnc"]
SIO --> REDIS["Redis Adapter<br/>(Pub/Sub)"]
WS --> VNC["Agent Pod<br/>VNC Proxy"]
3. NestJS WebSocket Gateway¶
3.1 Gateway Lifecycle¶
import {
WebSocketGateway,
WebSocketServer,
SubscribeMessage,
OnGatewayInit,
OnGatewayConnection,
OnGatewayDisconnect,
MessageBody,
ConnectedSocket,
WsException,
} from '@nestjs/websockets';
import { Server, Socket } from 'socket.io';
import { Logger, UseGuards } from '@nestjs/common';
import { WsJwtGuard } from '../auth/ws-jwt.guard';
@WebSocketGateway({
namespace: '/agent-events',
cors: { origin: 'https://app.warmwind.dev' },
transports: ['websocket'], // disable polling -- K8s handles WebSocket natively
})
export class AgentEventsGateway
implements OnGatewayInit, OnGatewayConnection, OnGatewayDisconnect
{
@WebSocketServer()
server: Server;
private readonly logger = new Logger(AgentEventsGateway.name);
afterInit(server: Server): void {
this.logger.log('AgentEventsGateway initialized');
}
async handleConnection(client: Socket): Promise<void> {
try {
// Extract and verify JWT from handshake
const token = client.handshake.auth?.token;
if (!token) throw new WsException('Missing auth token');
const user = await this.authService.verifyWsToken(token);
client.data.userId = user.id;
client.data.tenantId = user.tenantId;
// Join tenant-specific room
client.join(`tenant:${user.tenantId}`);
this.logger.log(`Client connected: ${client.id} (user: ${user.id})`);
} catch (err) {
client.emit('error', { message: 'Authentication failed' });
client.disconnect(true);
}
}
handleDisconnect(client: Socket): void {
this.logger.log(`Client disconnected: ${client.id}`);
}
@UseGuards(WsJwtGuard)
@SubscribeMessage('agent:subscribe')
handleAgentSubscribe(
@ConnectedSocket() client: Socket,
@MessageBody() data: { sessionId: string },
): void {
// Join the room for this agent session
client.join(`session:${data.sessionId}`);
this.logger.log(
`Client ${client.id} subscribed to session ${data.sessionId}`,
);
}
// Called from other services to broadcast agent state changes
emitAgentStatus(sessionId: string, status: string, payload: any): void {
this.server
.to(`session:${sessionId}`)
.emit('agent:status', { sessionId, status, ...payload });
}
}
3.2 Gateway Lifecycle Diagram¶
graph LR
C["Client connect"] -->|"handshake + auth"| HC["handleConnection()"]
HC -->|"join rooms"| READY["Connected"]
READY -->|"client emits event"| SM["@SubscribeMessage"]
SM -->|"handler returns"| ACK["Acknowledgement<br/>(if callback)"]
READY -->|"server pushes"| EMIT["server.to(room).emit()"]
READY -->|"client disconnects<br/>or network drops"| HD["handleDisconnect()"]
HD --> CLEAN["Cleanup"]
4. VNC-over-WebSocket¶
4.1 Architecture¶
Warmwind uses the noVNC/websockify stack to stream desktop frames from agent Pods to browsers. The data flow:
- The VNC server (TigerVNC or x11vnc) runs inside the agent container, listening on
localhost:5900using the RFB protocol over TCP. - websockify runs as a sidecar container, proxying WebSocket connections from port 6080 to VNC on
localhost:5900. - The browser client (noVNC library or custom client) connects via WebSocket to the NestJS backend, which reverse-proxies to the agent Pod's websockify.
- VNC framebuffer updates arrive as binary WebSocket frames containing RFB-encoded rectangles.
- The client decodes the RFB data and renders it to an HTML5 Canvas element.
graph LR
VNC["VNC Server<br/>(TigerVNC)<br/>TCP :5900"] -->|"RFB over TCP"| WSK["websockify<br/>sidecar<br/>WS :6080"]
WSK -->|"RFB over WebSocket<br/>(binary frames)"| NEST["NestJS<br/>WS Proxy"]
NEST -->|"RFB over WebSocket<br/>(binary frames)"| BROWSER["Browser<br/>noVNC + Canvas"]
4.2 Binary Frame Flow¶
Each VNC framebuffer update is a binary WebSocket frame. The RFB protocol encodes screen updates as rectangles with pixel data:
┌─────────────────────────────────────────┐
│ WebSocket Binary Frame (opcode 0x2) │
├─────────────────────────────────────────┤
│ RFB FramebufferUpdate message: │
│ message-type: 0 (1 byte) │
│ padding: 0 (1 byte) │
│ number-of-rectangles: N (2 bytes) │
│ ┌─────────────────────────────────┐ │
│ │ Rectangle 1: │ │
│ │ x-position (2 bytes) │ │
│ │ y-position (2 bytes) │ │
│ │ width (2 bytes) │ │
│ │ height (2 bytes) │ │
│ │ encoding-type (4 bytes) │ │
│ │ pixel-data (variable) │ │
│ └─────────────────────────────────┘ │
│ ... more rectangles ... │
└─────────────────────────────────────────┘
4.3 Client-Side Canvas Rendering¶
The browser receives binary WebSocket frames and renders them on a <canvas> element:
// Simplified VNC frame renderer (production code uses noVNC's RFB class)
class VncRenderer {
private canvas: HTMLCanvasElement;
private ctx: CanvasRenderingContext2D;
private ws: WebSocket;
constructor(canvas: HTMLCanvasElement, wsUrl: string) {
this.canvas = canvas;
this.ctx = canvas.getContext('2d')!;
this.ws = new WebSocket(wsUrl);
this.ws.binaryType = 'arraybuffer'; // receive as ArrayBuffer, not Blob
this.ws.onmessage = (event: MessageEvent) => {
this.handleFrame(event.data as ArrayBuffer);
};
}
private handleFrame(data: ArrayBuffer): void {
const view = new DataView(data);
let offset = 0;
const messageType = view.getUint8(offset); offset += 1;
offset += 1; // padding
if (messageType !== 0) return; // only handle FramebufferUpdate
const numRects = view.getUint16(offset); offset += 2;
for (let i = 0; i < numRects; i++) {
const x = view.getUint16(offset); offset += 2;
const y = view.getUint16(offset); offset += 2;
const width = view.getUint16(offset); offset += 2;
const height = view.getUint16(offset); offset += 2;
const encoding = view.getInt32(offset); offset += 4;
if (encoding === 0) {
// Raw encoding: RGBA pixel data
const pixelData = new Uint8ClampedArray(
data, offset, width * height * 4,
);
const imageData = new ImageData(pixelData, width, height);
this.ctx.putImageData(imageData, x, y);
offset += width * height * 4;
}
// Other encodings: Tight, ZRLE, Zlib, etc.
}
}
}
Binary WebSocket frames avoid Base64 overhead
The ws.binaryType = 'arraybuffer' setting is critical. Without it, the
browser wraps binary data in a Blob, requiring an async .arrayBuffer()
call before you can read it. With arraybuffer, the data is immediately
available as a typed array -- same zero-copy semantics you get with
read() on a Unix file descriptor. If you have written Rust code that
reads from a TcpStream into a &[u8], this is the JavaScript equivalent.
5. Scaling WebSockets Horizontally¶
5.1 The Problem¶
A single NestJS instance can handle ~10,000-50,000 concurrent WebSocket connections (depending on per-connection memory and frame rate). Warmwind needs to support more than that, and needs high availability -- no single point of failure.
When you add a second NestJS instance behind a load balancer, clients connected to instance A cannot receive events emitted on instance B. The instances are isolated.
5.2 Redis Adapter Solution¶
The @socket.io/redis-adapter bridges this gap. Every emit() is published to Redis Pub/Sub, and every instance subscribes to receive and forward those events to locally connected clients.
// main.ts -- configure Redis adapter
import { NestFactory } from '@nestjs/core';
import { IoAdapter } from '@nestjs/platform-socket.io';
import { createAdapter } from '@socket.io/redis-adapter';
import { createClient } from 'redis';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
const pubClient = createClient({ url: process.env.REDIS_URL });
const subClient = pubClient.duplicate();
await Promise.all([pubClient.connect(), subClient.connect()]);
const server = app.getHttpServer();
const io = require('socket.io')(server);
io.adapter(createAdapter(pubClient, subClient));
// Use a custom adapter that wraps the Redis-backed socket.io
app.useWebSocketAdapter(new IoAdapter(app));
await app.listen(3000);
}
bootstrap();
graph LR
LB["Load Balancer<br/>(Nginx)"] --> N1["NestJS Instance 1<br/>clients A, B"]
LB --> N2["NestJS Instance 2<br/>clients C, D"]
LB --> N3["NestJS Instance 3<br/>clients E, F"]
N1 <-->|"PUB/SUB"| REDIS["Redis<br/>(Pub/Sub)"]
N2 <-->|"PUB/SUB"| REDIS
N3 <-->|"PUB/SUB"| REDIS
How it works: When instance 1 calls server.to('session:abc').emit('status', data), the adapter publishes the event to Redis channel socket.io#session:abc. Instances 2 and 3 receive it via their subscriptions and deliver it to any local clients in that room.
5.3 Sticky Sessions: When and Why¶
Socket.io's HTTP long-polling fallback requires sticky sessions because multiple HTTP requests must hit the same server to maintain the polling "connection." With pure WebSocket transport (transports: ['websocket']), sticky sessions are unnecessary.
# Nginx config -- sticky sessions ONLY if you need polling fallback
upstream nestjs_ws {
ip_hash; # sticky session by client IP
server nestjs-1:3001;
server nestjs-2:3001;
server nestjs-3:3001;
}
# OR: pure WebSocket without sticky sessions
upstream nestjs_ws_pure {
least_conn; # best for long-lived WebSocket connections
server nestjs-1:3001;
server nestjs-2:3001;
server nestjs-3:3001;
}
server {
listen 443 ssl;
location /agent-events {
proxy_pass http://nestjs_ws_pure;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 86400s; # 24h -- don't timeout WS connections
proxy_send_timeout 86400s;
}
}
5.4 Connection State Management¶
With horizontal scaling, connection state cannot live in-process memory. Use Redis as the session store:
@Injectable()
export class ConnectionStateService {
constructor(private readonly redis: RedisService) {}
async registerConnection(
clientId: string,
userId: string,
sessionId: string,
instanceId: string,
): Promise<void> {
const key = `ws:conn:${clientId}`;
await this.redis.hmset(key, {
userId,
sessionId,
instanceId,
connectedAt: Date.now().toString(),
});
await this.redis.expire(key, 86400); // TTL = 24h
// Track active connections per user (for max-connections enforcement)
await this.redis.sadd(`ws:user:${userId}:connections`, clientId);
}
async removeConnection(clientId: string, userId: string): Promise<void> {
await this.redis.del(`ws:conn:${clientId}`);
await this.redis.srem(`ws:user:${userId}:connections`, clientId);
}
async getUserConnectionCount(userId: string): Promise<number> {
return this.redis.scard(`ws:user:${userId}:connections`);
}
}
6. Backpressure¶
6.1 The Problem¶
A VNC server can produce 30-60 frames per second at 1920x1080. Each raw frame is ~8 MB (1920 * 1080 * 4 bytes RGBA). Even with compression (Tight, ZRLE), a frame might be 50-200 KB. If the client cannot render frames fast enough (slow device, network congestion), the server-side send buffer grows without bound.
6.2 Detection and Mitigation¶
@Injectable()
export class VncRelayService {
private readonly MAX_BUFFER_SIZE = 4 * 1024 * 1024; // 4 MB
relayFrames(
agentWs: WebSocket, // connection TO the agent pod
clientWs: WebSocket, // connection FROM the browser
): void {
let droppedFrames = 0;
let qualityLevel = 9; // Tight encoding quality (0-9, 9 = best)
agentWs.on('message', (frame: Buffer) => {
// Check client backpressure via bufferedAmount
if (clientWs.bufferedAmount > this.MAX_BUFFER_SIZE) {
droppedFrames++;
// Adaptive quality: reduce encoding quality to shrink frames
if (droppedFrames > 10 && qualityLevel > 1) {
qualityLevel = Math.max(1, qualityLevel - 2);
this.sendQualityAdjustment(agentWs, qualityLevel);
droppedFrames = 0;
}
return; // drop this frame
}
// Reset quality if backpressure resolved
if (droppedFrames > 0 && clientWs.bufferedAmount < this.MAX_BUFFER_SIZE / 4) {
qualityLevel = Math.min(9, qualityLevel + 1);
this.sendQualityAdjustment(agentWs, qualityLevel);
droppedFrames = 0;
}
clientWs.send(frame, { binary: true });
});
}
private sendQualityAdjustment(ws: WebSocket, quality: number): void {
// Send RFB SetEncodings message requesting lower quality
const buf = Buffer.alloc(8);
buf.writeUInt8(2, 0); // message-type: SetEncodings
buf.writeUInt8(0, 1); // padding
buf.writeUInt16BE(1, 2); // number-of-encodings
buf.writeInt32BE(-32 + quality, 4); // Tight quality pseudo-encoding
ws.send(buf);
}
}
graph LR
AGENT["Agent VNC<br/>60 fps"] -->|"binary frames"| RELAY["NestJS Relay"]
RELAY -->|"check bufferedAmount"| DECISION{"> 4 MB?"}
DECISION -->|"No"| SEND["Forward to client"]
DECISION -->|"Yes"| DROP["Drop frame +<br/>reduce quality"]
DROP -->|"quality adjustment"| AGENT
6.3 Backpressure Metrics¶
Expose backpressure state as Prometheus metrics so Grafana can alert:
const droppedFramesCounter = new Counter({
name: 'vnc_dropped_frames_total',
help: 'Total VNC frames dropped due to client backpressure',
labelNames: ['session_id'],
});
const bufferSizeGauge = new Gauge({
name: 'vnc_client_buffer_bytes',
help: 'Current WebSocket send buffer size in bytes',
labelNames: ['session_id'],
});
const qualityGauge = new Gauge({
name: 'vnc_encoding_quality',
help: 'Current Tight encoding quality level (1-9)',
labelNames: ['session_id'],
});
7. Connection Resilience¶
7.1 Reconnection Strategy¶
Network disruptions are inevitable. The client must reconnect transparently without losing state.
// Client-side reconnection with exponential backoff + jitter
class ResilientVncConnection {
private ws: WebSocket | null = null;
private attempt = 0;
private readonly maxAttempts = 10;
private readonly baseDelay = 1000; // 1 second
private readonly maxDelay = 30000; // 30 seconds
private lastFrameId = 0; // for resumption
constructor(
private readonly url: string,
private readonly sessionId: string,
private readonly token: string,
) {
this.connect();
}
private connect(): void {
const wsUrl = `${this.url}?sessionId=${this.sessionId}&token=${this.token}&lastFrame=${this.lastFrameId}`;
this.ws = new WebSocket(wsUrl);
this.ws.binaryType = 'arraybuffer';
this.ws.onopen = () => {
this.attempt = 0; // reset on successful connection
console.log('VNC WebSocket connected');
};
this.ws.onmessage = (event: MessageEvent) => {
// Track frame sequence for resumption
this.lastFrameId++;
this.handleFrame(event.data as ArrayBuffer);
};
this.ws.onclose = (event: CloseEvent) => {
if (event.code === 1000 || event.code === 4001) {
// Normal closure or session terminated -- do not reconnect
return;
}
this.scheduleReconnect();
};
this.ws.onerror = () => {
// onerror is always followed by onclose -- reconnect handled there
};
}
private scheduleReconnect(): void {
if (this.attempt >= this.maxAttempts) {
console.error('Max reconnection attempts reached');
this.onFatalDisconnect();
return;
}
// Exponential backoff with full jitter (AWS-style)
const delay = Math.min(
this.maxDelay,
Math.random() * this.baseDelay * Math.pow(2, this.attempt),
);
this.attempt++;
console.log(`Reconnecting in ${Math.round(delay)}ms (attempt ${this.attempt})`);
setTimeout(() => this.connect(), delay);
}
private handleFrame(data: ArrayBuffer): void { /* render to canvas */ }
private onFatalDisconnect(): void { /* show UI error */ }
}
7.2 Server-Side Stale Connection Detection¶
Ping/pong alone is insufficient; you also need application-level heartbeats:
// In the WebSocket gateway
private readonly STALE_TIMEOUT = 60_000; // 60 seconds
handleConnection(client: Socket): void {
client.data.lastActivity = Date.now();
// Update activity on any incoming message
client.onAny(() => {
client.data.lastActivity = Date.now();
});
}
// Run every 30 seconds via @Cron or setInterval
@Cron('*/30 * * * * *')
pruneStaleConnections(): void {
const now = Date.now();
for (const [id, socket] of this.server.sockets.sockets) {
if (now - socket.data.lastActivity > this.STALE_TIMEOUT) {
this.logger.warn(`Pruning stale connection: ${id}`);
socket.disconnect(true);
}
}
}
8. Binary Protocol Design¶
8.1 Multiplexed Protocol¶
Warmwind's WebSocket connection carries two types of data over the same channel: VNC framebuffer updates (server-to-client) and AI agent commands (bidirectional). A simple type-length-value (TLV) envelope disambiguates:
┌─────────┬───────────┬──────────────────────────────┐
│ Type │ Length │ Payload │
│ (1 byte)│ (4 bytes) │ (variable) │
│ │ big-endian│ │
├─────────┼───────────┼──────────────────────────────┤
│ 0x01 │ N │ RFB FramebufferUpdate (raw) │
│ 0x02 │ N │ JSON: agent command/response │
│ 0x03 │ N │ Client input event (mouse/kb) │
│ 0x04 │ 0 │ Heartbeat (no payload) │
│ 0x05 │ N │ Quality/encoding negotiation │
│ 0xFF │ N │ Error message (UTF-8) │
└─────────┴───────────┴──────────────────────────────┘
// Server-side frame encoder
function encodeMessage(type: number, payload: Buffer | string): Buffer {
const payloadBuf =
typeof payload === 'string' ? Buffer.from(payload, 'utf-8') : payload;
const header = Buffer.alloc(5);
header.writeUInt8(type, 0);
header.writeUInt32BE(payloadBuf.length, 1);
return Buffer.concat([header, payloadBuf]);
}
// Client-side frame decoder
function decodeMessage(data: ArrayBuffer): { type: number; payload: ArrayBuffer } {
const view = new DataView(data);
const type = view.getUint8(0);
const length = view.getUint32(1);
const payload = data.slice(5, 5 + length);
return { type, payload };
}
// Usage:
// VNC frame: encodeMessage(0x01, vncFrameBuffer)
// Agent command: encodeMessage(0x02, JSON.stringify({ action: 'click', x: 100, y: 200 }))
// Heartbeat: encodeMessage(0x04, Buffer.alloc(0))
graph LR
subgraph "Single WebSocket Connection"
direction LR
VNC["0x01: VNC Frames<br/>(binary, high volume)"]
CMD["0x02: Agent Commands<br/>(JSON, low volume)"]
INPUT["0x03: User Input<br/>(mouse/keyboard)"]
HB["0x04: Heartbeat"]
end
VNC --> DEMUX["Client Demuxer"]
CMD --> DEMUX
INPUT --> DEMUX
HB --> DEMUX
DEMUX -->|"0x01"| CANVAS["Canvas Renderer"]
DEMUX -->|"0x02"| STATE["State Manager"]
DEMUX -->|"0x04"| HEART["Heartbeat Handler"]
This is the same approach as SSH channel multiplexing
SSH multiplexes shell sessions, port forwards, and SFTP over a single TCP
connection using channel IDs and message types. The TLV envelope above is
a simplified version of the same concept. In Rust, you would model this with
an enum MessageType { Vnc(Vec<u8>), Command(String), Input(InputEvent), ... }
and serialize with bincode or serde -- the JavaScript version just uses
DataView and manual byte offsets.
9. Performance Benchmarks¶
9.1 Throughput Benchmarks¶
Measured on NestJS 10 + ws 8.x, single instance, 4-core Xeon, 8 GB RAM:
| Metric | Raw ws |
Socket.io (WS transport) | Socket.io (polling) |
|---|---|---|---|
| Max connections (idle) | ~65,000 | ~40,000 | ~8,000 |
| Messages/sec (128-byte) | 180,000 | 95,000 | 12,000 |
| Messages/sec (64 KB binary) | 28,000 | 18,000 | 800 |
| Memory per connection | ~5 KB | ~12 KB | ~45 KB |
| P50 latency (echo) | 0.3 ms | 0.8 ms | 15 ms |
| P99 latency (echo) | 1.2 ms | 3.5 ms | 120 ms |
9.2 VNC-Specific Benchmarks¶
| Scenario | Frame Size | Frame Rate | Bandwidth | Client CPU |
|---|---|---|---|---|
| Idle desktop (mostly static) | 2-5 KB | 1-2 fps | ~10 KB/s | <1% |
| Text editing / terminal | 10-50 KB | 10-15 fps | ~500 KB/s | 3-5% |
| Video playback / animation | 100-300 KB | 30 fps | ~6 MB/s | 15-25% |
| Full-screen redraw (resize) | 500 KB-2 MB | 1 fps | ~1 MB/s | 5-10% |
9.3 Load Testing Script¶
#!/usr/bin/env bash
# websocket-bench.sh -- load test with wscat and GNU parallel
# Requires: wscat (npm i -g wscat), GNU parallel
URL="wss://stream.warmwind.dev/vnc?sessionId=loadtest"
CONNECTIONS=1000
DURATION=60
echo "Spawning $CONNECTIONS WebSocket connections for ${DURATION}s..."
seq 1 "$CONNECTIONS" | parallel -j "$CONNECTIONS" --will-cite \
"timeout ${DURATION}s wscat -c '${URL}&client={}' --no-color 2>/dev/null; echo 'client {} done'"
echo "Load test complete."
For serious benchmarks, use Artillery with the WebSocket engine or k6 with the ws module:
// k6-ws-bench.js
import ws from 'k6/ws';
import { check } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 500 }, // ramp to 500 connections
{ duration: '2m', target: 500 }, // hold
{ duration: '30s', target: 0 }, // ramp down
],
};
export default function () {
const url = 'wss://stream.warmwind.dev/agent-events';
const params = { headers: { Authorization: `Bearer ${__ENV.TOKEN}` } };
const res = ws.connect(url, params, (socket) => {
socket.on('open', () => {
socket.send(JSON.stringify({ event: 'agent:subscribe', data: { sessionId: 'bench' } }));
});
socket.on('message', (msg) => {
// count received messages
});
socket.setTimeout(() => socket.close(), 150000); // 2.5 min
});
check(res, { 'status is 101': (r) => r && r.status === 101 });
}
10. Key Takeaways for the Interview¶
| Topic | What to Demonstrate |
|---|---|
| Protocol | HTTP upgrade, frame format, opcodes, masking rationale, ping/pong |
| Socket.io vs ws | Know when to use each; Warmwind uses both for different streams |
| NestJS Gateway | @WebSocketGateway, lifecycle hooks, rooms, guards, decorators |
| VNC streaming | noVNC + websockify, binary frames, ArrayBuffer, Canvas rendering |
| Horizontal scaling | Redis adapter (how Pub/Sub broadcasts), when sticky sessions matter |
| Backpressure | bufferedAmount check, frame dropping, adaptive quality |
| Resilience | Exponential backoff with jitter, stale detection, close codes |
| Binary protocol | TLV envelope for multiplexed VNC + command channels |
| Benchmarks | Messages/sec, latency percentiles, memory per connection |
References
- RFC 6455: The WebSocket Protocol -- IETF
- NestJS WebSocket Gateways -- docs.nestjs.com
- NestJS WebSocket Adapter -- docs.nestjs.com
- Socket.io Redis Adapter -- socket.io
- Scalable WebSockets with NestJS and Redis -- LogRocket
- NestJS + Redis: Scaling Beyond 100k Connections -- Medium
- noVNC: HTML VNC client -- GitHub
- websockify: WebSocket to TCP proxy -- GitHub
- noVNC WebSocket Communication -- DeepWiki
- k6 WebSocket testing -- Grafana