Files
kernels/tests/regression
Hansung Kim 062403066e sgemm_tcore: Bring M/N-loop inside the kernel
Instead of spawning multiple threadblocks which comes with stack access
overhead, have 1 threadblock work on the entire M/N-space thru a loop.
Grid size is fixed to the hardware parallelism.

TODO currently only works with 1 cluster in the system.
2024-06-06 15:22:01 -07:00
..
2023-11-14 05:37:46 -08:00
2023-11-14 22:31:30 -08:00
2023-11-14 05:37:46 -08:00
2023-11-10 02:47:05 -08:00
2023-11-14 05:37:46 -08:00
2024-04-24 21:10:21 -07:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2023-11-14 05:37:46 -08:00
2024-06-06 15:19:39 -07:00
2024-06-06 15:19:39 -07:00
2023-11-27 02:21:47 -08:00
2023-11-14 05:37:46 -08:00
2024-06-06 15:19:39 -07:00
2023-11-27 02:21:47 -08:00