Skip to content

Instruction Set Reference

This is the complete instruction reference for the WAVE ISA v0.3. Each instruction is listed with its mnemonic, opcode, format, operands, and description.

Base format (32-bit): Used for register-to-register operations.

BitsFieldWidth
31-24Opcode8
23-20Modifier4
19-18Flags2
17-13Dst (rd)5
12-8Src1 (rs1)5
7-3Src2 (rs2)5
2-0Reserved3

Extended format (64-bit): Base word followed by a 32-bit immediate. Used for load/store offsets, branch targets, and large constants.

NotationMeaning
rdDestination register (r0-r31)
rs1First source register
rs2Second source register
imm3232-bit immediate (extended format)
[rs1 + imm32]Memory address: base register plus immediate offset

MnemonicOpcodeFormatOperandsDescription
ADD0x00Baserd, rs1, rs2Integer addition. rd = rs1 + rs2
SUB0x01Baserd, rs1, rs2Integer subtraction. rd = rs1 - rs2
MUL0x02Baserd, rs1, rs2Integer multiplication (low 32 bits). rd = (rs1 * rs2) & 0xFFFFFFFF
DIV0x03Baserd, rs1, rs2Signed integer division. rd = rs1 / rs2. Undefined if rs2 == 0.
UDIV0x04Baserd, rs1, rs2Unsigned integer division. rd = rs1 / rs2 (unsigned).
REM0x05Baserd, rs1, rs2Signed integer remainder. rd = rs1 % rs2.
UREM0x06Baserd, rs1, rs2Unsigned integer remainder.
NEG0x07Baserd, rs1Integer negation. rd = -rs1. rs2 ignored.
ABS0x08Baserd, rs1Absolute value. `rd =
MIN0x09Baserd, rs1, rs2Signed minimum. rd = min(rs1, rs2).
MAX0x0ABaserd, rs1, rs2Signed maximum. rd = max(rs1, rs2).
MULHI0x0BBaserd, rs1, rs2Integer multiplication (high 32 bits). rd = (rs1 * rs2) >> 32

Modifier bits select rounding mode: 0 = round-to-nearest-even (default), 1 = round-toward-zero, 2 = round-toward-positive-infinity, 3 = round-toward-negative-infinity.

MnemonicOpcodeFormatOperandsDescription
FADD0x10Baserd, rs1, rs2Floating-point addition. rd = rs1 + rs2
FSUB0x11Baserd, rs1, rs2Floating-point subtraction. rd = rs1 - rs2
FMUL0x12Baserd, rs1, rs2Floating-point multiplication. rd = rs1 * rs2
FDIV0x13Baserd, rs1, rs2Floating-point division. rd = rs1 / rs2
FNEG0x14Baserd, rs1Floating-point negation. rd = -rs1
FABS0x15Baserd, rs1Floating-point absolute value. `rd =
FSQRT0x16Baserd, rs1Square root. rd = sqrt(rs1)
FMIN0x17Baserd, rs1, rs2Floating-point minimum (NaN-propagating).
FMAX0x18Baserd, rs1, rs2Floating-point maximum (NaN-propagating).
FFLOOR0x19Baserd, rs1Floor. rd = floor(rs1)
FCEIL0x1ABaserd, rs1Ceiling. rd = ceil(rs1)
FROUND0x1BBaserd, rs1Round to nearest integer (ties to even).
FTRUNC0x1CBaserd, rs1Truncate toward zero.
FMA0x1DBaserd, rs1, rs2Fused multiply-add. rd = rs1 * rs2 + rd. Note: uses rd as the addend (accumulator).
FCVT_I2F0x1EBaserd, rs1Convert signed integer to float.
FCVT_F2I0x1FBaserd, rs1Convert float to signed integer (truncation).

MnemonicOpcodeFormatOperandsDescription
AND0x20Baserd, rs1, rs2Bitwise AND. rd = rs1 & rs2
OR0x21Baserd, rs1, rs2Bitwise OR. rd = rs1 | rs2
XOR0x22Baserd, rs1, rs2Bitwise XOR. rd = rs1 ^ rs2
NOT0x23Baserd, rs1Bitwise NOT. rd = ~rs1
SHL0x24Baserd, rs1, rs2Shift left. rd = rs1 << (rs2 & 31)
SHR0x25Baserd, rs1, rs2Logical shift right. rd = rs1 >>> (rs2 & 31)
SAR0x26Baserd, rs1, rs2Arithmetic shift right. rd = rs1 >> (rs2 & 31) (sign-extending)
POPCNT0x27Baserd, rs1Population count. rd = popcount(rs1)

Comparison instructions write 1 (true) or 0 (false) to rd. Modifier selects comparison predicate for FCMP.

MnemonicOpcodeFormatOperandsDescription
CMP_EQ0x28Baserd, rs1, rs2Integer equal. rd = (rs1 == rs2) ? 1 : 0
CMP_NE0x29Baserd, rs1, rs2Integer not equal.
CMP_LT0x2ABaserd, rs1, rs2Signed less than.
CMP_GE0x2BBaserd, rs1, rs2Signed greater or equal.
FCMP0x2CBaserd, rs1, rs2Floating-point compare. Modifier selects predicate: 0=EQ, 1=NE, 2=LT, 3=LE, 4=GT, 5=GE, 6=ORD (both not NaN), 7=UNORD (either NaN).

All memory instructions use the extended format (64-bit) to encode the address offset.

Modifier selects memory ordering: 0 = relaxed, 1 = acquire, 2 = release, 3 = acquire-release, 4 = sequentially consistent.

MnemonicOpcodeFormatOperandsDescription
LOAD0x30Extendedrd, [rs1 + imm32]Load 32-bit value from global memory into rd.
STORE0x31Extended[rs1 + imm32], rs2Store 32-bit value from rs2 to global memory. rd field unused.
LOCAL_LOAD0x38Extendedrd, [rs1 + imm32]Load 32-bit value from local (shared) memory.
LOCAL_STORE0x39Extended[rs1 + imm32], rs2Store 32-bit value to local (shared) memory.

Atomic instructions operate on global memory. Modifier selects the atomic operation variant.

MnemonicOpcodeFormatOperandsDescription
ATOMIC0x3CExtendedrd, [rs1 + imm32], rs2Atomic read-modify-write. Loads the value at [rs1 + imm32], writes the old value to rd, and applies the operation with rs2. Modifier selects operation: 0=ADD, 1=SUB, 2=AND, 3=OR, 4=XOR, 5=MIN, 6=MAX, 7=UMIN, 8=UMAX, 9=XCHG.
ATOMIC_CAS0x3DExtendedrd, [rs1 + imm32], rs2Atomic compare-and-swap. Compares value at [rs1 + imm32] with rd; if equal, stores rs2. Old value written to rd in either case.

Wave (subgroup) operations execute across all active lanes in a wave. The modifier selects the wave operation variant. rs2 is unused for broadcast; it serves as the shuffle source lane for SHUFFLE.

MnemonicOpcodeFormatOperandsDescription
WAVE_OP0x3EBaserd, rs1, rs2Wave-level operation. Modifier selects variant: 0=BROADCAST (broadcast rs1 from lane 0 to all lanes), 1=REDUCE_ADD (sum rs1 across all active lanes), 2=REDUCE_MIN, 3=REDUCE_MAX, 4=PREFIX_SUM (exclusive prefix sum of rs1), 5=SHUFFLE (read rs1 from lane rs2), 6=BALLOT (set bit i if lane i has rs1 != 0), 7=ANY (1 if any lane has rs1 != 0), 8=ALL (1 if all lanes have rs1 != 0).

The modifier on opcode 0x3F selects the specific control flow or synchronization operation.

MnemonicOpcodeFormatModifierOperandsDescription
BARRIER0x3FBase0(none)Workgroup barrier. All threads in the workgroup must reach this point before any proceed.
BRANCH0x3FExtended1imm32Unconditional branch. Sets PC to imm32.
BRANCH_IF0x3FExtended2rs1, imm32Conditional branch. If rs1 != 0, sets PC to imm32.
BRANCH_IFNOT0x3FExtended3rs1, imm32Conditional branch. If rs1 == 0, sets PC to imm32.
CALL0x3FExtended4imm32Push return address and branch to imm32.
RET0x3FBase5(none)Pop return address and branch to it.
EXIT0x3FBase6(none)Terminate the current thread.
NOP0x3FBase7(none)No operation.

RangeCategoryCount
0x00-0x0BInteger Arithmetic12
0x10-0x1FFloating-Point16
0x20-0x27Bitwise8
0x28-0x2CComparison5
0x30-0x31Global Memory2
0x38-0x39Local Memory2
0x3C-0x3DAtomic2
0x3EWave Operations1 (9 variants)
0x3FControl Flow / Sync1 (8 variants)
Total49 opcodes