Otherwise memtrace driver might send single-warp requests split into
multiple cycles, and coalescer (CoalShiftQueue) thinks they belong to
different warps.
Now do proper sourcegen for the tlCoal edge that's coming out
of the coalescer manager node. This also prevents inflight table from
being full.
This means we move setting source ID of coalReq to outside the
coalescer, because sourceGen needs looking into response bits as well,
which is easier to do outside coalescer at the toplevel.
FIXME: coalescer unit test is still broken.
This requires config.addressWidth to be increased to 32.
FIXME: This breaks CoalescerUnitTest with unsatisfied requirement
`Link's max transfer (8) < List<...>'s beatBytes (32)`.
Connect coalescer output directly to the uncoalescer at the toplevel, and do
table entry construction entirely inside the module.
WIP: unittest is very broken as a result of this.
This is required because otherwise we might overwrite into
the Verilog registers that contain a valid trace line that
was missed by downstream when it was not ready. Basically
whenever trace_read_cycle stalls, we also want to stall
__in_* registers.
Without this we log extraneous lines that were valid but not transacted with the
downstream as it was not ready, which affects validity of memtrace testing.
Trying to advance trace cycle while downstream is blocking
is tricky because DPI call is synchronous, and that gives
timing difference between the line we have fired to downstream
and the current cycle counter we maintain.
Just stall the counter whenever downstream is not ready
for now.
Doing function calls inside @(*) causes lint errors. Instead, remove
staging registers to eliminate 1 cycle latency between DPI call and
when output is visible to Chisel.