Files
kernels/kernels/blackwell_fp8_e4m3

blackwell_fp8_e4m3

Standalone FP8 E4M3 validation kernel for the Wu Blackwell BWGMMA branch.

This directory is the only kernel area used by the FP8 branch work. Existing FP16 HGEMM, wu_arch_cases, and flash kernels are intentionally left unchanged.

The validation runs one tensor warp on a 16x16x32 tile:

  • A is FP8 E4M3 1.0 (0x38)
  • B is FP8 E4M3 2.0 (0x40)
  • C is FP32 1.0 (0x3f800000)
  • Expected output is FP32 65.0 (0x42820000)
  • VirgoBlackwellConfig currently uses 4 core/memory lanes, so one tcgen05_cp/cb fragment is 16 bytes.

Build:

make -C /home/lzd/wu/wuarch/virgo-kernels/kernels/blackwell_fp8_e4m3