Files
kernels/tests/regression/flash_attention/kernel.cpp
Hansung Kim 1cfab40711 flash: Do Oi rescale with PV
Since Oi rescale has data dependency to previous Oi which gets produced
at the PV GEMM, both rescale+GEMM needs to be in a single pipeline stage
or otherwise it requires a stall.  So instead, compute only the
rescale factor in the online softmax stage and apply rescaling right
before PV.
2024-08-30 20:11:07 -07:00

29 KiB