Wayland Protocol Extensions Deep Dive¶
Warmwind's stack lives and dies by a handful of Wayland protocol extensions: screencopy for frame capture, virtual input for agent control, output management for headless display configuration, explicit sync for NVIDIA correctness, and security context for per-agent isolation. This article walks through the exact protocol flows, XML interface signatures, and compositor implementation details for each.
Protocol Landscape¶
graph LR
SC["ext-image-copy-capture"] --> Comp["Compositor"]
VK["virtual-keyboard"] --> Comp
VP["virtual-pointer"] --> Comp
OM["output-management"] --> Comp
Sync["drm-syncobj"] --> Comp
Sec["security-context"] --> Comp
Comp --> GPU["GPU / Headless"]
All protocols below are either wlr-* (wlroots-originated, unstable) or ext-* /
wp-* (standardized in wayland-protocols staging/stable). Warmwind targets
wlroots compositors, so both families are available.
1. Frame Capture: wlr-screencopy vs ext-image-copy-capture¶
wlr-screencopy-unstable-v1 (Legacy)¶
The original frame capture protocol. A client binds
zwlr_screencopy_manager_v1 and calls capture_output to get a
zwlr_screencopy_frame_v1 object.
Protocol flow:
Client Compositor
| |
|-- capture_output(cursor, output) --> |
| |
|<-- buffer(format, width, height) --- |
|<-- buffer(format2, ...) ------------ | (one event per supported format)
|<-- buffer_done -------------------- |
| |
|-- copy(wl_buffer) ----------------> | (client attaches a buffer)
| |
|<-- damage(x, y, w, h) ------------ | (v3+: changed region)
|<-- flags(y_invert) --------------- |
|<-- ready(tv_sec, tv_nsec) -------- | (frame captured)
Key XML interface (abbreviated from the Wayland Explorer page):
<interface name="zwlr_screencopy_manager_v1" version="3">
<request name="capture_output">
<arg name="overlay_cursor" type="int"/>
<arg name="output" type="object" interface="wl_output"/>
</request>
</interface>
<interface name="zwlr_screencopy_frame_v1" version="3">
<event name="buffer">
<arg name="format" type="uint"/>
<arg name="width" type="uint"/>
<arg name="height" type="uint"/>
<arg name="stride" type="uint"/>
</event>
<request name="copy">
<arg name="buffer" type="object" interface="wl_buffer"/>
</request>
<event name="damage">
<arg name="x" type="uint"/>
<arg name="y" type="uint"/>
<arg name="width" type="uint"/>
<arg name="height" type="uint"/>
</event>
<event name="ready">
<arg name="tv_sec_hi" type="uint"/>
<arg name="tv_sec_lo" type="uint"/>
<arg name="tv_nsec" type="uint"/>
</event>
</interface>
Limitations: One-shot capture (no persistent session), no built-in cursor capture plane separation, damage reporting only added in version 3.
ext-image-copy-capture-v1 (Standardized Successor)¶
The replacement protocol, merged into wayland-protocols staging. Used alongside
ext-image-capture-source-v1 which defines what to capture (output, toplevel,
or workspace).
Protocol flow:
Client Compositor
| |
|-- create_session(source) --------------> |
|<-- buffer_size(width, height) ---------- |
|<-- shm_format(format) / dmabuf(fmt) --- | (supported formats)
|<-- done -------------------------------- |
| |
|-- create_frame() ----------------------> | (returns frame object)
|-- frame.attach_buffer(wl_buffer) ------> |
|-- frame.damage_buffer(x, y, w, h) -----> | (hint: region client needs)
|-- frame.capture() ---------------------> |
| |
|<-- frame.transform(transform) --------- |
|<-- frame.damage(x, y, w, h) ----------- | (actual compositor damage)
|<-- frame.presentation_time(sec, nsec) -- |
|<-- frame.ready() ---------------------- |
Key differences from wlr-screencopy:
| Feature | wlr-screencopy | ext-image-copy-capture |
|---|---|---|
| Session persistence | No (one-shot) | Yes (create_session) |
| Client damage hints | No | Yes (damage_buffer) |
| Cursor capture | Overlay flag only | Separate cursor source |
| Toplevel capture | No | Yes (via capture-source) |
| Standardization | wlr-unstable | wayland-protocols staging |
XML signature for the session (from Wayland Explorer):
<interface name="ext_image_copy_capture_manager_v1" version="1">
<request name="create_session">
<arg name="session" type="new_id" interface="ext_image_copy_capture_session_v1"/>
<arg name="source" type="object" interface="ext_image_capture_source_v1"/>
<arg name="options" type="uint" enum="options"/>
</request>
</interface>
<interface name="ext_image_copy_capture_frame_v1" version="1">
<request name="attach_buffer">
<arg name="buffer" type="object" interface="wl_buffer"/>
</request>
<request name="damage_buffer">
<arg name="x" type="int"/>
<arg name="y" type="int"/>
<arg name="width" type="int"/>
<arg name="height" type="int"/>
</request>
<request name="capture"/>
<event name="ready"/>
</interface>
Warmwind implication: The persistent session avoids per-frame object creation overhead. The client damage hint lets the VNC encoder tell the compositor "I only need this rectangle re-captured," reducing GPU readback.
2. Input Injection: Virtual Keyboard and Pointer¶
WayVNC uses these to forward remote input events into the compositor.
wlr-virtual-keyboard-unstable-v1¶
Creates a virtual keyboard device associated with a seat. The client provides a keymap (XKB format) and sends key events directly.
<interface name="zwlr_virtual_keyboard_manager_v1" version="1">
<request name="create_virtual_keyboard">
<arg name="seat" type="object" interface="wl_seat"/>
</request>
</interface>
<interface name="zwlr_virtual_keyboard_v1" version="1">
<request name="keymap">
<arg name="format" type="uint"/>
<arg name="fd" type="fd"/>
<arg name="size" type="uint"/>
</request>
<request name="key">
<arg name="time" type="uint"/>
<arg name="key" type="uint"/>
<arg name="state" type="uint"/>
</request>
<request name="modifiers">
<arg name="mods_depressed" type="uint"/>
<arg name="mods_latched" type="uint"/>
<arg name="mods_locked" type="uint"/>
<arg name="group" type="uint"/>
</request>
</interface>
Keymap handling: The client sends an XKB keymap via fd before any key events. The compositor uses this keymap to interpret keycodes. WayVNC translates RFB key symbols to XKB keycodes, then sends them through this interface. Mismatched keymaps cause phantom key events -- a common debugging headache.
Protocol reference: Wayland Explorer -- virtual-keyboard
wlr-virtual-pointer-unstable-v1¶
Emulates a physical pointer device with motion, button, and axis events.
<interface name="zwlr_virtual_pointer_manager_v1" version="2">
<request name="create_virtual_pointer">
<arg name="seat" type="object" interface="wl_seat" allow-null="true"/>
</request>
<!-- v2 adds create_virtual_pointer_with_output -->
</interface>
<interface name="zwlr_virtual_pointer_v1" version="2">
<request name="motion">
<arg name="time" type="uint"/>
<arg name="dx" type="fixed"/>
<arg name="dy" type="fixed"/>
</request>
<request name="motion_absolute">
<arg name="time" type="uint"/>
<arg name="x" type="uint"/>
<arg name="y" type="uint"/>
<arg name="x_extent" type="uint"/>
<arg name="y_extent" type="uint"/>
</request>
<request name="button">
<arg name="time" type="uint"/>
<arg name="button" type="uint"/>
<arg name="state" type="uint"/>
</request>
<request name="axis">
<arg name="time" type="uint"/>
<arg name="axis" type="uint"/>
<arg name="value" type="fixed"/>
</request>
<request name="frame"/>
</interface>
Event serialization: All pointer events between two frame requests are
treated as a single atomic input frame. WayVNC batches motion + button into one
frame to prevent the compositor from processing partial input state.
Protocol reference: Wayland Explorer -- wlr-virtual-pointer
graph LR
RFB["RFB Key/Pointer"] --> WayVNC["WayVNC"]
WayVNC --> VK["virtual-keyboard"]
WayVNC --> VP["virtual-pointer"]
VK --> Seat["wl_seat"]
VP --> Seat
Seat --> Focus["Focused Surface"]
3. Output Management: wlr-output-management-unstable-v1¶
Programmatic display configuration -- resolution, scale, position, transform. This is how Warmwind creates and configures headless outputs for each agent session.
Protocol flow:
Compositor Client (e.g. wlr-randr, Warmwind)
| |
|-- head(head_obj) ----------------------> | (one per output)
|-- head.name("HEADLESS-1") -------------> |
|-- head.mode(mode_obj) -----------------> | (available modes)
|-- head.mode.size(1920, 1080) ----------> |
|-- head.mode.refresh(60000) ------------> | (mHz)
|-- head.current_mode(mode_obj) ---------> |
|-- head.enabled(1) ---------------------> |
|-- done(serial) ------------------------> | (snapshot complete)
| |
|<-- create_configuration(serial) -------- |
|<-- config.enable_head(head, config_head) |
|<-- config_head.set_mode(mode) ---------- |
|<-- config_head.set_scale(2.0) ---------- |
|<-- config_head.set_transform(90) ------- |
|<-- config.apply() --------------------- |
| |
|-- succeeded / failed / cancelled ------> |
Key XML signatures (from Wayland Explorer):
<interface name="zwlr_output_manager_v1" version="4">
<event name="head">
<arg name="head" type="new_id" interface="zwlr_output_head_v1"/>
</event>
<event name="done">
<arg name="serial" type="uint"/>
</event>
<request name="create_configuration">
<arg name="id" type="new_id" interface="zwlr_output_configuration_v1"/>
<arg name="serial" type="uint"/>
</request>
</interface>
<interface name="zwlr_output_configuration_head_v1" version="4">
<request name="set_mode">
<arg name="mode" type="object" interface="zwlr_output_mode_v1"/>
</request>
<request name="set_custom_mode">
<arg name="width" type="int"/>
<arg name="height" type="int"/>
<arg name="refresh" type="int"/>
</request>
<request name="set_scale">
<arg name="scale" type="fixed"/>
</request>
<request name="set_transform">
<arg name="transform" type="int"/>
</request>
</interface>
Warmwind usage with wlr-randr:
# List headless outputs
wlr-randr
# Create a custom mode on a headless output
wlr-randr --output HEADLESS-1 --custom-mode 1920x1080@60Hz
# Scale for HiDPI AI vision capture
wlr-randr --output HEADLESS-1 --scale 2
# Rotate (transform) for portrait-mode testing
wlr-randr --output HEADLESS-1 --transform 90
4. Explicit Sync: wp-linux-drm-syncobj-v1¶
Why Explicit Sync Matters¶
Implicit sync (the legacy model) relies on the kernel driver to track GPU fences internally. This works for Mesa drivers but breaks for NVIDIA's proprietary driver, which uses a different fence model. The result: tearing, corruption, and race conditions on NVIDIA under Wayland.
wp-linux-drm-syncobj-v1 introduces timeline-based synchronization using
DRM synchronization objects (syncobjs). Each syncobj has a monotonically
increasing timeline of points.
The Timeline Point Model¶
graph LR
Client["Client GPU work"] -- "acquire point N" --> Comp["Compositor"]
Comp -- "release point N+1" --> Client
<interface name="wp_linux_drm_syncobj_manager_v1" version="1">
<request name="get_surface">
<arg name="id" type="new_id" interface="wp_linux_drm_syncobj_surface_v1"/>
<arg name="surface" type="object" interface="wl_surface"/>
</request>
<request name="import_timeline">
<arg name="id" type="new_id" interface="wp_linux_drm_syncobj_timeline_v1"/>
<arg name="fd" type="fd"/>
</request>
</interface>
<interface name="wp_linux_drm_syncobj_surface_v1" version="1">
<request name="set_acquire_point">
<arg name="timeline" type="object" interface="wp_linux_drm_syncobj_timeline_v1"/>
<arg name="point_hi" type="uint"/>
<arg name="point_lo" type="uint"/>
</request>
<request name="set_release_point">
<arg name="timeline" type="object" interface="wp_linux_drm_syncobj_timeline_v1"/>
<arg name="point_hi" type="uint"/>
<arg name="point_lo" type="uint"/>
</request>
</interface>
How it works:
- Client imports a DRM syncobj timeline via fd.
- Before
wl_surface.commit, client callsset_acquire_point(timeline, N)-- "compositor, wait until point N is signalled before reading this buffer." - Client also calls
set_release_point(timeline, N+1)-- "compositor, signal point N+1 when you are done with the buffer." - Compositor waits on acquire, reads/composites, signals release.
- Client waits on release before reusing the buffer.
Impact on VNC frame capture: With explicit sync, the screencopy protocol can properly synchronize with GPU rendering. Without it, the captured frame may contain partially-rendered content (especially on NVIDIA).
Protocol reference: Wayland Explorer -- linux-drm-syncobj
Sway 1.11 added linux-drm-syncobj-v1 support via wlroots 0.19, and
Chromium has also merged support for it on the client side.
5. Security Context: wp-security-context-v1¶
The Isolation Problem¶
On X11, any client can snoop on any other client's keystrokes and screen
content. Wayland eliminates this by design -- but the compositor still has no
way to know which clients are sandboxed and which are not. Enter
wp-security-context-v1.
How It Works¶
A sandbox engine (Flatpak, Bubblewrap, or Warmwind's agent launcher) creates a new Wayland socket using this protocol. Clients connecting through that socket are marked with security metadata.
<interface name="wp_security_context_manager_v1" version="1">
<request name="create_listener">
<arg name="id" type="new_id" interface="wp_security_context_v1"/>
<arg name="listen_fd" type="fd"/>
<arg name="close_fd" type="fd"/>
</request>
</interface>
<interface name="wp_security_context_v1" version="1">
<request name="set_sandbox_engine">
<arg name="name" type="string"/>
</request>
<request name="set_app_id">
<arg name="app_id" type="string"/>
</request>
<request name="set_instance_id">
<arg name="instance_id" type="string"/>
</request>
<request name="commit"/>
</interface>
Flow:
- Launcher creates a Unix socket pair.
- Calls
create_listener(listen_fd, close_fd)on the compositor. - Sets
sandbox_engine("com.warmwind.agent"),app_id("agent-session-42"). - Calls
commit()-- the compositor now listens on this socket. - Agent process receives the socket via fd passing and uses it as
WAYLAND_DISPLAY. - Compositor restricts the agent: no screencopy of other outputs, no clipboard access across sessions, no input injection outside its own seat.
Warmwind per-agent isolation: Each AI agent session gets its own security
context. The compositor policy engine checks the sandbox_engine and app_id
before granting protocol access. An agent on output HEADLESS-1 cannot capture
HEADLESS-2.
Protocol reference: Wayland Explorer -- security-context
graph LR
Launcher["Agent Launcher"] -- "create_listener(fd)" --> Comp["Sway"]
Launcher -- "socket fd" --> Agent["AI Agent"]
Agent -- "restricted Wayland" --> Comp
Comp -- "policy check" --> Proto["Protocol Access"]
Protocol Implementation Status (March 2026)¶
| Protocol | wlroots | Sway | KDE | COSMIC | Niri |
|---|---|---|---|---|---|
| ext-image-copy-capture-v1 | 0.18+ | 1.10+ | In progress | Yes | Planned |
| wlr-virtual-keyboard | Yes | Yes | N/A | N/A | N/A |
| wlr-virtual-pointer | Yes | Yes | N/A | N/A | N/A |
| wlr-output-management | Yes | Yes | Yes | Yes | Yes |
| linux-drm-syncobj-v1 | 0.19+ | 1.11+ | 6.1+ | Yes | Yes |
| security-context-v1 | 0.18+ | 1.10+ | 6.1+ | Yes | Yes |
What's new (2025--2026)
- wlroots 0.20 RC (Feb 2026) adds
ext-workspace-v1,color-management-v1v2, andxdg-toplevel-tag-v1. - Flatpak 1.16 shipped with
wp-security-context-v1support, making sandboxed Wayland clients properly identifiable to compositors. - Chromium merged
linux-drm-syncobj-v1for proper NVIDIA explicit sync on Wayland.
Glossary
- wlr-screencopy
- Original wlroots protocol for one-shot compositor output capture into a shared buffer.
- ext-image-copy-capture-v1
- Standardized successor to wlr-screencopy with persistent sessions, client damage hints, and cursor source separation.
- DRM syncobj
- Kernel object representing a synchronization timeline. Used for explicit GPU fence signalling between client and compositor.
- Timeline point
- A monotonically increasing integer on a DRM syncobj timeline. Acquire points gate compositor access; release points gate client reuse.
- Security context
- Wayland protocol extension allowing sandbox engines to attach metadata (engine name, app ID, instance ID) to client connections.
- wlr-randr
- CLI tool using
wlr-output-managementto query and configure outputs on wlroots compositors. Equivalent ofxrandrfor Wayland. - XKB keymap
- X Keyboard Extension keymap format. Defines the mapping from physical keycodes to logical key symbols. Sent via fd to virtual-keyboard.
- Frame request
- The
framerequest onzwlr_virtual_pointer_v1that batches all preceding pointer events into a single atomic input frame.