blackwell_fp8_e4m3
Standalone FP8 E4M3 validation kernel for the Wu Blackwell BWGMMA branch.
This directory is the only kernel area used by the FP8 branch work. Existing
FP16 HGEMM, wu_arch_cases, and flash kernels are intentionally left unchanged.
The validation runs one tensor warp on a 16x16x32 tile:
- A is FP8 E4M3 1.0 (
0x38) - B is FP8 E4M3 2.0 (
0x40) - C is FP32 1.0 (
0x3f800000) - Expected output is FP32 65.0 (
0x42820000) VirgoBlackwellConfigcurrently uses 4 core/memory lanes, so onetcgen05_cp/cbfragment is 16 bytes.
Build:
make -C /home/lzd/wu/wuarch/virgo-kernels/kernels/blackwell_fp8_e4m3