Files
kernels/tests/regression
Hansung Kim 21b6655c10 sgemm_impl: Implement fast coalesced wmma_store
Enables a fairer comparison between core-coupled tensor core to Hopper
tensor core, where the latter benefits from coalesced full-throughput
moveout to GMEM because it does not use the 1x2 interleaved register
mapping.  This means the result matrix will be stored swizzled in the
GMEM, without breaking correctness.
2024-10-29 22:34:22 -07:00
..
2024-06-07 18:11:19 -07:00
2023-11-14 05:37:46 -08:00
2023-11-14 22:31:30 -08:00
2023-11-14 05:37:46 -08:00
2023-11-10 02:47:05 -08:00
2023-11-14 05:37:46 -08:00
2024-04-24 21:10:21 -07:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2024-06-07 18:11:19 -07:00
2024-10-24 17:12:34 -07:00
2024-06-06 15:19:39 -07:00
2023-11-27 02:21:47 -08:00
2023-11-14 05:37:46 -08:00
2023-11-27 02:21:47 -08:00