tensor: Add buffer to hide 2cyc commit latency
Since operand and commit throughput are the same (2 cycles), it is unnecessary to stall the dpu during the multi-cycle commit. This enables the dpu to operate at full throughput of 1 operand every 2 cycles.
This commit is contained in:
@@ -391,7 +391,7 @@
|
||||
|
||||
// Tensor Core Latency
|
||||
`ifndef LATENCY_HMMA
|
||||
`define LATENCY_HMMA 2
|
||||
`define LATENCY_HMMA 8
|
||||
`endif
|
||||
|
||||
// Icache Configurable Knobs //////////////////////////////////////////////////
|
||||
|
||||
Reference in New Issue
Block a user