Commit Graph

7 Commits

Author SHA1 Message Date
Hansung Kim
a17edac875 flash: Fix barrier stall with DEBUG
Verified for up to P_expected on 2nd iter; O_before_PV is partially
correct
2024-09-09 17:02:05 -07:00
Hansung Kim
b652e25945 flash: Warp-specialize between warp 0 and 1-7
Finishes without stalls; No dependency check between O rescale and
GEMM-II.
2024-09-09 16:42:30 -07:00
Hansung Kim
cdb8377b62 flash: Do GEMM II in Gemmini; verify 1st iteration 2024-09-08 16:09:06 -07:00
Hansung Kim
3f50ac57ee flash: use 12bit dma interface 2024-09-08 15:29:56 -07:00
Hansung Kim
c51dc4902d flash: Fix online softmax for DMA layout 2024-09-07 23:21:28 -07:00
Hansung Kim
2e1485877d flash: Add Gemmini-accelerated kernel 2024-09-07 22:40:58 -07:00
Hansung Kim
b3be271b88 flash: Split impl to header file 2024-09-07 21:16:35 -07:00