Files
kernels/tests
Hansung Kim 062403066e sgemm_tcore: Bring M/N-loop inside the kernel
Instead of spawning multiple threadblocks which comes with stack access
overhead, have 1 threadblock work on the entire M/N-space thru a loop.
Grid size is fixed to the hardware parallelism.

TODO currently only works with 1 cluster in the system.
2024-06-06 15:22:01 -07:00
..
2023-11-10 02:47:05 -08:00
2023-11-11 15:49:39 -08:00
2024-03-24 01:47:00 -07:00
2023-11-10 02:47:05 -08:00