Files
kernels/tests/regression/flash_attention/kernel.cpp
Hansung Kim 221d5f75c2 flash: Optimize smem alloc for tcore for 8banks
Divide into first half & last half for warpgroup 0 & 1, and
allocate Q/K and P/V in different banks for parallel acccess.
2024-09-19 21:31:39 -07:00

38 KiB