Skip to content

Wayland Protocol Extensions Deep Dive

Warmwind's stack lives and dies by a handful of Wayland protocol extensions: screencopy for frame capture, virtual input for agent control, output management for headless display configuration, explicit sync for NVIDIA correctness, and security context for per-agent isolation. This article walks through the exact protocol flows, XML interface signatures, and compositor implementation details for each.

Protocol Landscape

graph LR
    SC["ext-image-copy-capture"] --> Comp["Compositor"]
    VK["virtual-keyboard"] --> Comp
    VP["virtual-pointer"] --> Comp
    OM["output-management"] --> Comp
    Sync["drm-syncobj"] --> Comp
    Sec["security-context"] --> Comp
    Comp --> GPU["GPU / Headless"]

All protocols below are either wlr-* (wlroots-originated, unstable) or ext-* / wp-* (standardized in wayland-protocols staging/stable). Warmwind targets wlroots compositors, so both families are available.


1. Frame Capture: wlr-screencopy vs ext-image-copy-capture

wlr-screencopy-unstable-v1 (Legacy)

The original frame capture protocol. A client binds zwlr_screencopy_manager_v1 and calls capture_output to get a zwlr_screencopy_frame_v1 object.

Protocol flow:

Client                              Compositor
  |                                      |
  |-- capture_output(cursor, output) --> |
  |                                      |
  |<-- buffer(format, width, height) --- |
  |<-- buffer(format2, ...) ------------ |   (one event per supported format)
  |<-- buffer_done -------------------- |
  |                                      |
  |-- copy(wl_buffer) ----------------> |   (client attaches a buffer)
  |                                      |
  |<-- damage(x, y, w, h) ------------ |   (v3+: changed region)
  |<-- flags(y_invert) --------------- |
  |<-- ready(tv_sec, tv_nsec) -------- |   (frame captured)

Key XML interface (abbreviated from the Wayland Explorer page):

<interface name="zwlr_screencopy_manager_v1" version="3">
  <request name="capture_output">
    <arg name="overlay_cursor" type="int"/>
    <arg name="output" type="object" interface="wl_output"/>
  </request>
</interface>

<interface name="zwlr_screencopy_frame_v1" version="3">
  <event name="buffer">
    <arg name="format" type="uint"/>
    <arg name="width" type="uint"/>
    <arg name="height" type="uint"/>
    <arg name="stride" type="uint"/>
  </event>
  <request name="copy">
    <arg name="buffer" type="object" interface="wl_buffer"/>
  </request>
  <event name="damage">
    <arg name="x" type="uint"/>
    <arg name="y" type="uint"/>
    <arg name="width" type="uint"/>
    <arg name="height" type="uint"/>
  </event>
  <event name="ready">
    <arg name="tv_sec_hi" type="uint"/>
    <arg name="tv_sec_lo" type="uint"/>
    <arg name="tv_nsec" type="uint"/>
  </event>
</interface>

Limitations: One-shot capture (no persistent session), no built-in cursor capture plane separation, damage reporting only added in version 3.

ext-image-copy-capture-v1 (Standardized Successor)

The replacement protocol, merged into wayland-protocols staging. Used alongside ext-image-capture-source-v1 which defines what to capture (output, toplevel, or workspace).

Protocol flow:

Client                                   Compositor
  |                                           |
  |-- create_session(source) --------------> |
  |<-- buffer_size(width, height) ---------- |
  |<-- shm_format(format) / dmabuf(fmt) --- |   (supported formats)
  |<-- done -------------------------------- |
  |                                           |
  |-- create_frame() ----------------------> |   (returns frame object)
  |-- frame.attach_buffer(wl_buffer) ------> |
  |-- frame.damage_buffer(x, y, w, h) -----> |   (hint: region client needs)
  |-- frame.capture() ---------------------> |
  |                                           |
  |<-- frame.transform(transform) --------- |
  |<-- frame.damage(x, y, w, h) ----------- |   (actual compositor damage)
  |<-- frame.presentation_time(sec, nsec) -- |
  |<-- frame.ready() ---------------------- |

Key differences from wlr-screencopy:

Feature wlr-screencopy ext-image-copy-capture
Session persistence No (one-shot) Yes (create_session)
Client damage hints No Yes (damage_buffer)
Cursor capture Overlay flag only Separate cursor source
Toplevel capture No Yes (via capture-source)
Standardization wlr-unstable wayland-protocols staging

XML signature for the session (from Wayland Explorer):

<interface name="ext_image_copy_capture_manager_v1" version="1">
  <request name="create_session">
    <arg name="session" type="new_id" interface="ext_image_copy_capture_session_v1"/>
    <arg name="source" type="object" interface="ext_image_capture_source_v1"/>
    <arg name="options" type="uint" enum="options"/>
  </request>
</interface>

<interface name="ext_image_copy_capture_frame_v1" version="1">
  <request name="attach_buffer">
    <arg name="buffer" type="object" interface="wl_buffer"/>
  </request>
  <request name="damage_buffer">
    <arg name="x" type="int"/>
    <arg name="y" type="int"/>
    <arg name="width" type="int"/>
    <arg name="height" type="int"/>
  </request>
  <request name="capture"/>
  <event name="ready"/>
</interface>

Warmwind implication: The persistent session avoids per-frame object creation overhead. The client damage hint lets the VNC encoder tell the compositor "I only need this rectangle re-captured," reducing GPU readback.


2. Input Injection: Virtual Keyboard and Pointer

WayVNC uses these to forward remote input events into the compositor.

wlr-virtual-keyboard-unstable-v1

Creates a virtual keyboard device associated with a seat. The client provides a keymap (XKB format) and sends key events directly.

<interface name="zwlr_virtual_keyboard_manager_v1" version="1">
  <request name="create_virtual_keyboard">
    <arg name="seat" type="object" interface="wl_seat"/>
  </request>
</interface>

<interface name="zwlr_virtual_keyboard_v1" version="1">
  <request name="keymap">
    <arg name="format" type="uint"/>
    <arg name="fd" type="fd"/>
    <arg name="size" type="uint"/>
  </request>
  <request name="key">
    <arg name="time" type="uint"/>
    <arg name="key" type="uint"/>
    <arg name="state" type="uint"/>
  </request>
  <request name="modifiers">
    <arg name="mods_depressed" type="uint"/>
    <arg name="mods_latched" type="uint"/>
    <arg name="mods_locked" type="uint"/>
    <arg name="group" type="uint"/>
  </request>
</interface>

Keymap handling: The client sends an XKB keymap via fd before any key events. The compositor uses this keymap to interpret keycodes. WayVNC translates RFB key symbols to XKB keycodes, then sends them through this interface. Mismatched keymaps cause phantom key events -- a common debugging headache.

Protocol reference: Wayland Explorer -- virtual-keyboard

wlr-virtual-pointer-unstable-v1

Emulates a physical pointer device with motion, button, and axis events.

<interface name="zwlr_virtual_pointer_manager_v1" version="2">
  <request name="create_virtual_pointer">
    <arg name="seat" type="object" interface="wl_seat" allow-null="true"/>
  </request>
  <!-- v2 adds create_virtual_pointer_with_output -->
</interface>

<interface name="zwlr_virtual_pointer_v1" version="2">
  <request name="motion">
    <arg name="time" type="uint"/>
    <arg name="dx" type="fixed"/>
    <arg name="dy" type="fixed"/>
  </request>
  <request name="motion_absolute">
    <arg name="time" type="uint"/>
    <arg name="x" type="uint"/>
    <arg name="y" type="uint"/>
    <arg name="x_extent" type="uint"/>
    <arg name="y_extent" type="uint"/>
  </request>
  <request name="button">
    <arg name="time" type="uint"/>
    <arg name="button" type="uint"/>
    <arg name="state" type="uint"/>
  </request>
  <request name="axis">
    <arg name="time" type="uint"/>
    <arg name="axis" type="uint"/>
    <arg name="value" type="fixed"/>
  </request>
  <request name="frame"/>
</interface>

Event serialization: All pointer events between two frame requests are treated as a single atomic input frame. WayVNC batches motion + button into one frame to prevent the compositor from processing partial input state.

Protocol reference: Wayland Explorer -- wlr-virtual-pointer

graph LR
    RFB["RFB Key/Pointer"] --> WayVNC["WayVNC"]
    WayVNC --> VK["virtual-keyboard"]
    WayVNC --> VP["virtual-pointer"]
    VK --> Seat["wl_seat"]
    VP --> Seat
    Seat --> Focus["Focused Surface"]

3. Output Management: wlr-output-management-unstable-v1

Programmatic display configuration -- resolution, scale, position, transform. This is how Warmwind creates and configures headless outputs for each agent session.

Protocol flow:

Compositor                              Client (e.g. wlr-randr, Warmwind)
  |                                           |
  |-- head(head_obj) ----------------------> |   (one per output)
  |-- head.name("HEADLESS-1") -------------> |
  |-- head.mode(mode_obj) -----------------> |   (available modes)
  |-- head.mode.size(1920, 1080) ----------> |
  |-- head.mode.refresh(60000) ------------> |   (mHz)
  |-- head.current_mode(mode_obj) ---------> |
  |-- head.enabled(1) ---------------------> |
  |-- done(serial) ------------------------> |   (snapshot complete)
  |                                           |
  |<-- create_configuration(serial) -------- |
  |<-- config.enable_head(head, config_head) |
  |<-- config_head.set_mode(mode) ---------- |
  |<-- config_head.set_scale(2.0) ---------- |
  |<-- config_head.set_transform(90) ------- |
  |<-- config.apply() --------------------- |
  |                                           |
  |-- succeeded / failed / cancelled ------> |

Key XML signatures (from Wayland Explorer):

<interface name="zwlr_output_manager_v1" version="4">
  <event name="head">
    <arg name="head" type="new_id" interface="zwlr_output_head_v1"/>
  </event>
  <event name="done">
    <arg name="serial" type="uint"/>
  </event>
  <request name="create_configuration">
    <arg name="id" type="new_id" interface="zwlr_output_configuration_v1"/>
    <arg name="serial" type="uint"/>
  </request>
</interface>

<interface name="zwlr_output_configuration_head_v1" version="4">
  <request name="set_mode">
    <arg name="mode" type="object" interface="zwlr_output_mode_v1"/>
  </request>
  <request name="set_custom_mode">
    <arg name="width" type="int"/>
    <arg name="height" type="int"/>
    <arg name="refresh" type="int"/>
  </request>
  <request name="set_scale">
    <arg name="scale" type="fixed"/>
  </request>
  <request name="set_transform">
    <arg name="transform" type="int"/>
  </request>
</interface>

Warmwind usage with wlr-randr:

# List headless outputs
wlr-randr

# Create a custom mode on a headless output
wlr-randr --output HEADLESS-1 --custom-mode 1920x1080@60Hz

# Scale for HiDPI AI vision capture
wlr-randr --output HEADLESS-1 --scale 2

# Rotate (transform) for portrait-mode testing
wlr-randr --output HEADLESS-1 --transform 90

4. Explicit Sync: wp-linux-drm-syncobj-v1

Why Explicit Sync Matters

Implicit sync (the legacy model) relies on the kernel driver to track GPU fences internally. This works for Mesa drivers but breaks for NVIDIA's proprietary driver, which uses a different fence model. The result: tearing, corruption, and race conditions on NVIDIA under Wayland.

wp-linux-drm-syncobj-v1 introduces timeline-based synchronization using DRM synchronization objects (syncobjs). Each syncobj has a monotonically increasing timeline of points.

The Timeline Point Model

graph LR
    Client["Client GPU work"] -- "acquire point N" --> Comp["Compositor"]
    Comp -- "release point N+1" --> Client
<interface name="wp_linux_drm_syncobj_manager_v1" version="1">
  <request name="get_surface">
    <arg name="id" type="new_id" interface="wp_linux_drm_syncobj_surface_v1"/>
    <arg name="surface" type="object" interface="wl_surface"/>
  </request>
  <request name="import_timeline">
    <arg name="id" type="new_id" interface="wp_linux_drm_syncobj_timeline_v1"/>
    <arg name="fd" type="fd"/>
  </request>
</interface>

<interface name="wp_linux_drm_syncobj_surface_v1" version="1">
  <request name="set_acquire_point">
    <arg name="timeline" type="object" interface="wp_linux_drm_syncobj_timeline_v1"/>
    <arg name="point_hi" type="uint"/>
    <arg name="point_lo" type="uint"/>
  </request>
  <request name="set_release_point">
    <arg name="timeline" type="object" interface="wp_linux_drm_syncobj_timeline_v1"/>
    <arg name="point_hi" type="uint"/>
    <arg name="point_lo" type="uint"/>
  </request>
</interface>

How it works:

  1. Client imports a DRM syncobj timeline via fd.
  2. Before wl_surface.commit, client calls set_acquire_point(timeline, N) -- "compositor, wait until point N is signalled before reading this buffer."
  3. Client also calls set_release_point(timeline, N+1) -- "compositor, signal point N+1 when you are done with the buffer."
  4. Compositor waits on acquire, reads/composites, signals release.
  5. Client waits on release before reusing the buffer.

Impact on VNC frame capture: With explicit sync, the screencopy protocol can properly synchronize with GPU rendering. Without it, the captured frame may contain partially-rendered content (especially on NVIDIA).

Protocol reference: Wayland Explorer -- linux-drm-syncobj

Sway 1.11 added linux-drm-syncobj-v1 support via wlroots 0.19, and Chromium has also merged support for it on the client side.


5. Security Context: wp-security-context-v1

The Isolation Problem

On X11, any client can snoop on any other client's keystrokes and screen content. Wayland eliminates this by design -- but the compositor still has no way to know which clients are sandboxed and which are not. Enter wp-security-context-v1.

How It Works

A sandbox engine (Flatpak, Bubblewrap, or Warmwind's agent launcher) creates a new Wayland socket using this protocol. Clients connecting through that socket are marked with security metadata.

<interface name="wp_security_context_manager_v1" version="1">
  <request name="create_listener">
    <arg name="id" type="new_id" interface="wp_security_context_v1"/>
    <arg name="listen_fd" type="fd"/>
    <arg name="close_fd" type="fd"/>
  </request>
</interface>

<interface name="wp_security_context_v1" version="1">
  <request name="set_sandbox_engine">
    <arg name="name" type="string"/>
  </request>
  <request name="set_app_id">
    <arg name="app_id" type="string"/>
  </request>
  <request name="set_instance_id">
    <arg name="instance_id" type="string"/>
  </request>
  <request name="commit"/>
</interface>

Flow:

  1. Launcher creates a Unix socket pair.
  2. Calls create_listener(listen_fd, close_fd) on the compositor.
  3. Sets sandbox_engine("com.warmwind.agent"), app_id("agent-session-42").
  4. Calls commit() -- the compositor now listens on this socket.
  5. Agent process receives the socket via fd passing and uses it as WAYLAND_DISPLAY.
  6. Compositor restricts the agent: no screencopy of other outputs, no clipboard access across sessions, no input injection outside its own seat.

Warmwind per-agent isolation: Each AI agent session gets its own security context. The compositor policy engine checks the sandbox_engine and app_id before granting protocol access. An agent on output HEADLESS-1 cannot capture HEADLESS-2.

Protocol reference: Wayland Explorer -- security-context

graph LR
    Launcher["Agent Launcher"] -- "create_listener(fd)" --> Comp["Sway"]
    Launcher -- "socket fd" --> Agent["AI Agent"]
    Agent -- "restricted Wayland" --> Comp
    Comp -- "policy check" --> Proto["Protocol Access"]

Protocol Implementation Status (March 2026)

Protocol wlroots Sway KDE COSMIC Niri
ext-image-copy-capture-v1 0.18+ 1.10+ In progress Yes Planned
wlr-virtual-keyboard Yes Yes N/A N/A N/A
wlr-virtual-pointer Yes Yes N/A N/A N/A
wlr-output-management Yes Yes Yes Yes Yes
linux-drm-syncobj-v1 0.19+ 1.11+ 6.1+ Yes Yes
security-context-v1 0.18+ 1.10+ 6.1+ Yes Yes
What's new (2025--2026)
  • wlroots 0.20 RC (Feb 2026) adds ext-workspace-v1, color-management-v1 v2, and xdg-toplevel-tag-v1.
  • Flatpak 1.16 shipped with wp-security-context-v1 support, making sandboxed Wayland clients properly identifiable to compositors.
  • Chromium merged linux-drm-syncobj-v1 for proper NVIDIA explicit sync on Wayland.

Glossary

wlr-screencopy
Original wlroots protocol for one-shot compositor output capture into a shared buffer.
ext-image-copy-capture-v1
Standardized successor to wlr-screencopy with persistent sessions, client damage hints, and cursor source separation.
DRM syncobj
Kernel object representing a synchronization timeline. Used for explicit GPU fence signalling between client and compositor.
Timeline point
A monotonically increasing integer on a DRM syncobj timeline. Acquire points gate compositor access; release points gate client reuse.
Security context
Wayland protocol extension allowing sandbox engines to attach metadata (engine name, app ID, instance ID) to client connections.
wlr-randr
CLI tool using wlr-output-management to query and configure outputs on wlroots compositors. Equivalent of xrandr for Wayland.
XKB keymap
X Keyboard Extension keymap format. Defines the mapping from physical keycodes to logical key symbols. Sent via fd to virtual-keyboard.
Frame request
The frame request on zwlr_virtual_pointer_v1 that batches all preceding pointer events into a single atomic input frame.