Commit Graph

7 Commits

Author SHA1 Message Date
Hansung Kim
b44b202a21 sgemm_impl: Rename to wmma 2024-08-18 16:21:22 -07:00
Hansung Kim
b978bf8757 sgemm_impl: Split tile offset addr gen from wmma store
& add an option to write to smem in gemm_single_tile.
2024-08-18 16:10:29 -07:00
Hansung Kim
d0809d292a sgemm: Specify A/B tile SMEM address via template args
& split single-time GEMM into a separate function.
2024-08-16 18:01:57 -07:00
Hansung Kim
a1858e0c80 sgemm_impl: Parameterize BK/TCK by FP_SIZE 2024-08-15 20:33:33 -07:00
Hansung Kim
014f7cd06f sgemm_tcore: Unpack arg params, remove threadblock_dim_y
thread_block_gemm is meant to be reusable, so it shouldn't assume what
the kernel arg struct looks like.

threadblock_dim_y was ambiguous and didn't match the literal name either
(it was used as # of warps that participate in a barrier).
2024-08-14 20:34:49 -07:00
Hansung Kim
1b1264207b sgemm_tcore: Add compile-time write_to_gmem param to thread_block_gemm 2024-08-14 17:48:31 -07:00
Hansung Kim
ee6339a35f sgemm_tcore: Split all impl code into sgemm_impl.hpp
This is to make thread_block_gemm a re-usable library function for GEMM
operations for use in other kernels.
2024-08-14 16:24:48 -07:00