Python SDK
The wave_gpu Python package lets you write GPU kernels in Python and run them on any supported GPU backend. This guide walks through installation, device setup, array management, kernel authoring, and error handling.
Installation
Section titled “Installation”Install from PyPI:
pip install wave-gpuImport the package in your code:
import wave_gpuDevice Detection
Section titled “Device Detection”Before launching kernels you need to confirm that a GPU is available. wave_gpu.device() returns a DeviceInfo object describing the first detected device:
dev = wave_gpu.device()print(dev.vendor) # e.g. "Apple"print(dev.name) # e.g. "Apple M2 Max"If no GPU is found the call raises a RuntimeError.
Creating Arrays
Section titled “Creating Arrays”WaveArray is the primary buffer type. It lives in device-accessible memory. Create one with the factory helpers:
# From existing Python dataa = wave_gpu.array([1.0, 2.0, 3.0], dtype="f32")
# Pre-filled buffersz = wave_gpu.zeros(1024, dtype="f32")o = wave_gpu.ones(1024, dtype="f32")Supported dtypes: f16, f32, f64, i32, u32.
WaveArray exposes a few useful properties and methods:
| Member | Description |
|---|---|
data | Raw underlying data |
dtype | Element type string |
len(arr) | Number of elements (__len__) |
arr[i] | Element access (__getitem__) |
arr.to_list() | Copy contents back to a Python list |
Writing Kernels
Section titled “Writing Kernels”Decorate a plain Python function with @wave_gpu.kernel to mark it as a GPU kernel. Inside the kernel body you can use WAVE intrinsics to determine the current thread’s position and synchronize:
@wave_gpu.kerneldef vector_add(a, b, out, n): tid = wave_gpu.thread_id() if tid < n: out[tid] = a[tid] + b[tid]Available Intrinsics
Section titled “Available Intrinsics”| Intrinsic | Description |
|---|---|
thread_id() | Global thread index |
workgroup_id() | Workgroup (block) index |
lane_id() | Lane within the current wave/warp |
wave_width() | Number of lanes per wave |
barrier() | Workgroup-level synchronization barrier |
Kernel parameters must be either WaveArray objects (for buffer access) or plain int / float values (for scalar uniforms).
Launching Kernels
Section titled “Launching Kernels”Call the decorated function directly. WAVE compiles the kernel on first invocation and caches the result. You can optionally specify the dispatch grid and workgroup size:
n = 1024a = wave_gpu.array([float(i) for i in range(n)], dtype="f32")b = wave_gpu.ones(n, dtype="f32")out = wave_gpu.zeros(n, dtype="f32")
# Simple launch - WAVE infers a 1-D grid from the first buffervector_add(a, b, out, n)
# Explicit grid and workgroup dimensionsvector_add(a, b, out, n, grid=(n // 256, 1, 1), workgroup=(256, 1, 1))Reading Results
Section titled “Reading Results”After a kernel completes, read data back to the host with to_list():
result = out.to_list()print(result[:8]) # [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]You can also index individual elements for quick spot-checks:
assert out[0] == 1.0Error Handling
Section titled “Error Handling”WAVE surfaces two main exception types:
RuntimeError- raised when kernel compilation fails (e.g. unsupported intrinsic, invalid shader generation) or when no device is found.TypeError- raised when kernel arguments do not match the expected signature (wrong type or count).
Handle them with standard try / except:
try: broken_kernel(a, b, out, n)except RuntimeError as e: print(f"Compilation or device error: {e}")except TypeError as e: print(f"Bad kernel arguments: {e}")Next Steps
Section titled “Next Steps”See the full Python API Reference for detailed method signatures and advanced options.