With 4 warps, we can only do 32x64 GEMM; serialize 64x64 into 2 32x64 GEMM calls by split by the row.
27 KiB
27 KiB
With 4 warps, we can only do 32x64 GEMM; serialize 64x64 into 2 32x64 GEMM calls by split by the row.