Skip to content

WAVE Specification

The WAVE specification defines the instruction set architecture, binary encoding format, memory model, and execution semantics for the WAVE GPU compute model.

The v0.4 specification is the current version of the WAVE ISA. It fixes predicate encoding (Defect 4), restructures word0 bits [3:0], and adds neural network training as conformance evidence.

Key changes from v0.3:

  • Predicate encoding restored. Bits [3:0] of word0 repurposed for predicate fields: pred_reg (bits [1:0]), pred_neg (bit [2]), with bit [3] reserved. Previously these bits held scope and flags, which silently dropped all predication.
  • Scope moved to word1. Memory scope is now encoded in word1 bits [1:0] for extended instructions (DeviceAtomic, fence). This freed word0 bits for predicate encoding.
  • New opcode: Misc (0x41). Misc operations (mov, mov_imm, mov_sr) moved from Control (0x3F) to their own opcode, since the flags field they used for dispatch no longer exists.
  • SyncOp modifier offset. Sync operations (return, halt, barrier, fence) share Control opcode 0x3F but use modifier values offset by +8 (SYNC_MODIFIER_OFFSET = 8).
  • Type conversion naming convention clarified. The mnemonic cvt_A_B converts from type B to type A (destination type first). An emulator bug where cvt_f32_i32 and cvt_i32_f32 were swapped has been fixed.
  • Neural network training verification. A complete two-layer MNIST network (784->128->10) has been trained using 11 WAVE kernels, with gradients verified against PyTorch.

Each version of the specification covers the following sections:

  1. Introduction — purpose, scope, design principles, and relationship to other standards.
  2. Execution Model — thread hierarchy, identifiers, core resources, execution guarantees, and dispatch.
  3. Register Model — general-purpose registers, sub-register access, register pairs, special registers, and predicate registers.
  4. Memory Model — memory spaces, local/device memory details, memory ordering, and atomic operations.
  5. Control Flow — structured control flow, uniform branches, divergence/reconvergence, and per-wave state.
  6. Instruction Set — integer, bitwise, floating-point, type conversion, comparison, memory, atomic, wave, synchronization, control flow, and MMA instructions.
  7. Capability System — required constants, optional capabilities, MMA parameters, and query mechanism.
  8. Binary Encoding — base and extended instruction formats, opcode map.
  9. Conformance — required behavior, implementation-defined behavior, undefined behavior, and conformance testing.