Hansung Kim
b44b202a21
sgemm_impl: Rename to wmma
2024-08-18 16:21:22 -07:00
Hansung Kim
b978bf8757
sgemm_impl: Split tile offset addr gen from wmma store
...
& add an option to write to smem in gemm_single_tile.
2024-08-18 16:10:29 -07:00
Hansung Kim
d0809d292a
sgemm: Specify A/B tile SMEM address via template args
...
& split single-time GEMM into a separate function.
2024-08-16 18:01:57 -07:00
Hansung Kim
a1858e0c80
sgemm_impl: Parameterize BK/TCK by FP_SIZE
2024-08-15 20:33:33 -07:00
Hansung Kim
014f7cd06f
sgemm_tcore: Unpack arg params, remove threadblock_dim_y
...
thread_block_gemm is meant to be reusable, so it shouldn't assume what
the kernel arg struct looks like.
threadblock_dim_y was ambiguous and didn't match the literal name either
(it was used as # of warps that participate in a barrier).
2024-08-14 20:34:49 -07:00
Hansung Kim
1b1264207b
sgemm_tcore: Add compile-time write_to_gmem param to thread_block_gemm
2024-08-14 17:48:31 -07:00
Hansung Kim
ee6339a35f
sgemm_tcore: Split all impl code into sgemm_impl.hpp
...
This is to make thread_block_gemm a re-usable library function for GEMM
operations for use in other kernels.
2024-08-14 16:24:48 -07:00