tensor: Add buffer to hide 2cyc commit latency
Since operand and commit throughput are the same (2 cycles), it is unnecessary to stall the dpu during the multi-cycle commit. This enables the dpu to operate at full throughput of 1 operand every 2 cycles.
This commit is contained in:
@@ -40,6 +40,7 @@ module VX_tensor_dpu #(
|
||||
// ready as soon as valid_out
|
||||
assign ready_in = ready_reg || valid_out;
|
||||
|
||||
// fixed-latency model
|
||||
VX_shift_register #(
|
||||
.DATAW (1 + $bits(D_tile)),
|
||||
.DEPTH (`LATENCY_HMMA),
|
||||
|
||||
Reference in New Issue
Block a user