Skip to content

Binary Encoding

The WAVE binary encoding packs every instruction into a fixed 32-bit word, with an optional second word for extended operations, balancing decode simplicity against expressiveness.

Every WAVE instruction fits into a single 32-bit word with the following bit layout:

BitsFieldWidthPurpose
31:26opcode6 bitsOperation class (up to 64 opcodes)
25:21RD5 bitsDestination register (0—31)
20:16RS15 bitsSource register 1 (0—31)
15:11RS25 bitsSource register 2 (0—31)
10:7modifier4 bitsOperation variant within opcode class (0—15)
6:5scope2 bitsMemory scope (wave, workgroup, device, system)
4:3predicate2 bitsPredicate register selector
2pred_neg1 bitNegate predicate condition
1:0flags2 bitsInstruction-specific flags
31 26 25 21 20 16 15 11 10 7 6 5 4 3 2 1 0
┌──────┬──────┬──────┬──────┬──────┬────┬────┬───┬────┐
│opcode│ RD │ RS1 │ RS2 │ mod │scop│pred│neg│flag│
│ 6b │ 5b │ 5b │ 5b │ 4b │ 2b │ 2b │1b │ 2b│
└──────┴──────┴──────┴──────┴──────┴────┴────┴───┴────┘

Six bits provide 64 opcode slots. WAVE currently uses fewer than 40, leaving room for future extensions without changing the encoding width. Wider opcodes would steal bits from register fields or modifiers; narrower opcodes would constrain the instruction set too early.

Five bits address 32 registers per thread. This is a deliberate trade-off: 32 registers are sufficient for the vast majority of GPU kernels (validated against shader compiler statistics from all four target vendors), while keeping the instruction word at 32 bits. The v0.1 spec text incorrectly referenced 256 registers despite using 5-bit fields; this was identified and corrected as a spec defect (see Spec Defects).

The modifier field disambiguates variants within an opcode class. For example, the FUnaryOp opcode class uses the modifier to select between fsqrt (0), frsqrt (1), … fsin (8), fcos (9), fexp2 (10), flog2 (11). Four bits encode values 0—15, covering all current variants with room to grow. The earlier 3-bit modifier could only encode 0—7, which caused a concrete encoding failure documented in Modifier Field Evolution.

The 2-bit scope field encodes memory ordering visibility:

ValueScopeMeaning
00WaveVisible within the executing wave
01WorkgroupVisible to all waves in the workgroup
10DeviceVisible to all waves on the device
11SystemVisible to the device and host CPU

When an instruction needs more than two source registers or an inline immediate, it uses a second 32-bit word:

Word 0: standard 32-bit encoding (as above)
Word 1 (extended):
31 27 26 22 21 0
┌──────┬──────┬──────────────────────────────────┐
│ RS3 │ RS4 │ reserved / imm │
│ 5b │ 5b │ 22b │
└──────┴──────┴──────────────────────────────────┘

Alternatively, the entire second word can serve as a 32-bit immediate value. The opcode in word 0 determines interpretation: if the opcode is in the extended class, the decoder fetches a second word; otherwise, the instruction is complete at 32 bits.

  • Fused multiply-add (fma r0, r1, r2, r3): needs RS1, RS2, and RS3.
  • Immediate loads (imm r0, 0x3F800000): the 32-bit immediate occupies the entire second word.
  • Atomic compare-and-swap (atom.cas r0, r1, r2, r3): needs address, expected, desired, and result registers.

Compiled WAVE programs are distributed as .wbin (WAVE Binary) files. The container is designed for direct memory-mapping without parsing overhead.

OffsetSizeFieldDescription
04magicASCII "WAVE" (0x57415645)
44versionContainer format version
84code_offsetByte offset to code section
124code_sizeSize of code section in bytes
164symbol_offsetByte offset to symbol table
204symbol_sizeSize of symbol table in bytes
244metadata_offsetByte offset to metadata section
284metadata_sizeSize of metadata section in bytes

Each kernel entry in the metadata section describes the resource requirements that a backend needs for dispatch:

FieldPurpose
nameKernel identifier (null-terminated UTF-8)
register_countNumber of registers used per thread
local_mem_sizeBytes of local (workgroup-shared) memory required
workgroup_size_xWorkgroup dimension X
workgroup_size_yWorkgroup dimension Y
workgroup_size_zWorkgroup dimension Z
code_offsetByte offset of this kernel’s code within the code section
code_sizeSize of this kernel’s code in bytes

The WBIN format is intentionally minimal. It carries exactly the information a backend runtime needs to allocate resources and dispatch kernels. There is no debug info, no relocations, and no linking metadata in the base format --- these are relegated to optional extension sections identified by the metadata table. This keeps the critical path (load, allocate, dispatch) as fast as possible.