Files
kernels/kernels/wu_arch_hgemm/README.md
2026-05-27 05:54:55 +00:00

446 B

wu_arch_hgemm

Tensor-warp HGEMM smoke test for the Wu split scalar/tensor warp configuration with the 4-lane Blackwell tensor-core path.

Scalar warp 0 initializes the shared-memory B operand, spawns only the tensor warp mask, waits for tensor warps NUM_SCALAR_WARPS..NUM_WARPS-1, and reports completion through tohost. Tensor warps execute the Blackwell custom HGEMM instruction sequence using 16-byte fragments and then stop themselves.