tensor: Add buffer to hide 2cyc commit latency

Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
This commit is contained in:
Hansung Kim
2024-05-16 20:07:30 -07:00
parent 317695a8d0
commit 5034d8d14b
3 changed files with 25 additions and 4 deletions

View File

@@ -40,6 +40,7 @@ module VX_tensor_dpu #(
// ready as soon as valid_out
assign ready_in = ready_reg || valid_out;
// fixed-latency model
VX_shift_register #(
.DATAW (1 + $bits(D_tile)),
.DEPTH (`LATENCY_HMMA),