WAVE Specification
The WAVE specification defines the instruction set architecture, binary encoding format, memory model, and execution semantics for the WAVE GPU compute model.
Specification Versions
Section titled “Specification Versions”v0.3 — Current
Section titled “v0.3 — Current”The v0.3 specification is the current stable version of the WAVE ISA. It resolves encoding issues present in earlier versions and establishes the canonical instruction format used by all WAVE tooling.
Key changes from v0.2:
- Modifier field widened to 4 bits. The 3-bit modifier field from v0.2 was insufficient to encode all instruction variants. v0.3 expands it to 4 bits, supporting up to 16 modifier values per opcode.
- Flags field reduced from 3 bits to 2 bits. Two flags were removed:
WAVE_REDUCE_FLAG— now handled by dedicated wave operation opcodes.NON_RETURNING_ATOMIC_FLAG— now encoded via the modifier field on atomic instructions.
- Canonical register encoding: 5-bit register fields encoding 32 general-purpose registers (
r0–r31). - 4-bit modifier enables fine-grained instruction variants (e.g., memory ordering, rounding mode, comparison predicate).
Instruction word layout (v0.3, 32-bit base format):
| Bits | Field | Width |
|---|---|---|
| 31–24 | Opcode | 8 |
| 23–20 | Modifier | 4 |
| 19–18 | Flags | 2 |
| 17–13 | Dst (rd) | 5 |
| 12–8 | Src1 (rs1) | 5 |
| 7–3 | Src2 (rs2) | 5 |
| 2–0 | Reserved | 3 |
Extended format instructions append a second 32-bit immediate word.
v0.2 fixed the most critical encoding bug from v0.1 — register fields — but still carried an undersized modifier field.
Key changes from v0.1:
- Register fields widened to 8 bits. v0.1 used 5-bit register fields but attempted to address more than 32 registers in some instructions, causing encoding collisions. v0.2 moved to 8-bit register fields, supporting up to 256 registers.
- Per-wave control flow. v0.1 shared a single control flow state across all waves in a workgroup. v0.2 introduced per-wave program counters and divergence masks, enabling proper SIMT execution with independent branching per wave.
- 3-bit modifier (unchanged from v0.1). This was later identified as a limitation and corrected in v0.3.
Known issues (fixed in v0.3):
- The 3-bit modifier field could only encode 8 variants, which was insufficient for memory ordering modes, comparison predicates, and rounding modes.
WAVE_REDUCE_FLAGandNON_RETURNING_ATOMIC_FLAGconsumed flag bits that could be better allocated.- 8-bit register fields were unnecessarily wide — no real workload needed more than 32 registers per thread.
v0.1 — Initial Draft
Section titled “v0.1 — Initial Draft”The initial specification established the core instruction set and execution model. It was a proof-of-concept encoding that validated the overall architecture but contained several encoding issues.
Characteristics:
- 5-bit register fields encoding 32 general-purpose registers.
- 3-bit modifier field for instruction variants.
- 3-bit flags field including
WAVE_REDUCE_FLAGandNON_RETURNING_ATOMIC_FLAG. - Shared control flow state. All waves in a workgroup shared a single program counter and divergence mask. This meant branch divergence in one wave could stall the entire workgroup.
Known issues (fixed in v0.2):
- Shared control flow prevented efficient SIMT divergence handling.
- Register encoding was correct at 5 bits, but v0.2 mistakenly widened it to 8 bits before v0.3 restored the 5-bit width.
Specification Contents
Section titled “Specification Contents”Each version of the specification covers the following sections:
- Introduction — purpose, scope, design principles, and relationship to other standards.
- Execution Model — thread hierarchy, identifiers, core resources, execution guarantees, and dispatch.
- Register Model — general-purpose registers, sub-register access, register pairs, special registers, and predicate registers.
- Memory Model — memory spaces, local/device memory details, memory ordering, and atomic operations.
- Control Flow — structured control flow, uniform branches, divergence/reconvergence, and per-wave state.
- Instruction Set — integer, bitwise, floating-point, type conversion, comparison, memory, atomic, wave, synchronization, control flow, and MMA instructions.
- Capability System — required constants, optional capabilities, MMA parameters, and query mechanism.
- Binary Encoding — base and extended instruction formats, opcode map.
- Conformance — required behavior, implementation-defined behavior, undefined behavior, and conformance testing.