Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one. Might introduce more frontend stalls, need more
experimenting
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
VCS requires the output of the dpi calls to be of a type that can come
at the LHS of a procedural assignment, i.e. reg type. Seems to be a
different requirement from Verilator.
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
minor update
minor update
minor update
minor update
minor update
minor update
cleanup
cleanup
cache bindings and memory perf refactory
minor update
minor update
hw unit tests fixes
minor update
minor update
minor update
minor update
minor update
minor udpate
minor update
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor updates
minor updates
minor update
minor update