Hansung Kim
52bb827a46
Handle BLOCK_SIZE != 1 in dispatch_unit
...
+ change ALU and FPU unit to use it as well
2024-05-30 23:20:21 -07:00
Hansung Kim
a02773eb92
Add more efficient dispatch_unit
...
Instead of having a single candidate to be considered for dispatch
(designated by 'batch_idx' counter), add a dispatch_unit variant that
considerse all `ISSUE_WIDTH dispatch signals and picks a valid one in a
round-robin manner.
This increases core utilization significantly due to better overlapping
of smem/tensor ops.
2024-05-30 21:55:42 -07:00
Hansung Kim
574cc0e5f0
tensor: Document configuring queue depths
2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f
tensor: Fix sync for dpu warp queue as well
2024-05-30 18:22:36 -07:00
Hansung Kim
0a032ab400
tensor: Fix out-of-sync enqueue to dpu and metadata queue
2024-05-30 18:03:04 -07:00
Hansung Kim
97f37b1c75
tensor: Add commit stall injection for debugging
2024-05-30 18:00:26 -07:00
Hansung Kim
06e0f901ff
tensor: Handle backpressure from metadata queue
2024-05-30 17:34:49 -07:00
Hansung Kim
dfb2276657
tensor: Remove redundant issue queue outside pdu
2024-05-30 17:29:59 -07:00
Hansung Kim
2743d32bd2
tensor: Handle wid queue backpressure in dpu
2024-05-30 15:25:00 -07:00
Hansung Kim
5ed6041e33
tensor: Properly stall dpu upon commit backpressure
...
& better-reasoned queue depths
2024-05-29 17:05:53 -07:00
Hansung Kim
f5a9ca5bf3
tensor: Enqueue both insts in pair to issue queue
...
Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one. Might introduce more frontend stalls, need more
experimenting
2024-05-29 14:47:25 -07:00
Hansung Kim
c03a5b070c
tensor: Issue queue for dpu to improve utilization
2024-05-27 18:25:10 -07:00
Hansung Kim
28f6cd59b5
tensor: Improve commit efficiency by decoupling dpu with fifo
2024-05-26 22:00:25 -07:00
Hansung Kim
864265bda5
tensor: Fix consecutive commits to write to same warp
...
... by splitting the pending_uops queue across warps.
2024-05-25 20:04:31 -07:00
Hansung Kim
8775458a8f
Stage half-operands per warp
...
An easy solution to handle multiple concurrent warp operations by
staging half-operands in their own per-warp register. This might
increase area requirement by quite a bit.
TODO: Commit is not being handled correctly yet
2024-05-25 19:09:56 -07:00
Hansung Kim
45d86b26a2
tensor: Add counter for dpu operations
2024-05-16 22:15:01 -07:00
Hansung Kim
5034d8d14b
tensor: Add buffer to hide 2cyc commit latency
...
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
317695a8d0
Add perf counters on LSU resp valid tmasks
2024-05-16 15:34:54 -07:00
Hansung Kim
89e7d65926
tensor: Add ready signal to enforce 1 warp occupancy
...
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Hansung Kim
1a1094b2bb
tensor: Add dispatch unit to narrow to BLOCK_SIZE=1
2024-05-16 15:34:54 -07:00
Hansung Kim
9f9ec10960
tensor: Enable scaling NUM_THREADS by octets
...
todo: lane-to-octet mapping is arbitrary atm
2024-05-16 15:34:50 -07:00
Richard Yan
d624b3e50a
store fencing, large smem, fix tensor core for firesim
2024-05-15 21:45:48 -07:00
Richard Yan
4aad161739
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-05-07 14:00:31 -07:00
Richard Yan
c9a3eaad79
accelerator cisc
2024-05-07 13:58:32 -07:00
Richard Yan
14d1552f08
potential deadlock
2024-05-07 13:56:51 -07:00
Hansung Kim
868bbdb15e
tensor: more doc
2024-05-07 13:54:10 -07:00
Hansung Kim
fb626ee21c
tensor: doc
2024-05-05 18:35:52 -07:00
Hansung Kim
9ea291eea2
Merge remote-tracking branch 'origin/tensor_core' into rtl
2024-05-05 17:03:57 -07:00
joshua
5bd25985c6
i kinda forgot most of changes
2024-05-04 23:01:47 -07:00
Hansung Kim
1c7acab160
tensor: Fix lint errors
2024-05-03 15:43:02 -07:00
Hansung Kim
5a0ee98a61
Remove duplicate port connection
2024-05-03 15:07:24 -07:00
Hansung Kim
c4d71bc3d6
tensor: Fix multiple driver error on VCS
2024-05-01 21:40:48 -07:00
Hansung Kim
7fc5b6a374
tensor: Fix elaboration error on VCS
2024-05-01 21:40:45 -07:00
Hansung Kim
675e8ea130
Merge branch 'tensor_core' into rtl
2024-05-01 16:18:14 -07:00
Hansung Kim
9a688a05b1
Add (unconnected) FPU perf counters
...
mainly for debugging
2024-04-29 15:20:55 -07:00
Richard Yan
85213d2876
synthesizable design
2024-04-17 18:05:51 -07:00
Richard Yan
17fd29c114
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-04-16 23:03:04 -07:00
Richard Yan
8de5470da4
round robin warp scheduling
2024-04-16 23:03:00 -07:00
Hansung Kim
217bc189da
ifdef-guard VX_operand* to enable including both in Chisel
2024-04-15 22:06:47 -07:00
Hansung Kim
978b1fe2d0
Add operands stage with duplicated RF for rs1/2/3
2024-04-15 16:45:59 -07:00
Hansung Kim
87b966a5fa
Add perf counter for stall by any operand hazard
2024-04-15 01:01:26 -07:00
Hansung Kim
6c632200d5
Divide by per-breakdown cycle for avg stall cycles
2024-04-03 15:29:51 -07:00
Hansung Kim
62c7d1f4cf
Report any fire cycles from scoreboard as well
2024-03-29 12:23:15 -07:00
Hansung Kim
50263a5f7d
Rename sched_barrier_stalls -> perf_sched_barrier_idles
...
Sched stall by barrier is really idle because it causes !scheduler_if.valid,
which is counted as part of sched_idle.
2024-03-28 22:45:12 -07:00
joshua
08d7721e11
annoying swizzling problems
2024-03-28 03:00:15 -07:00
joshua
e16584ddd9
bleh still not work
2024-03-27 00:26:04 -07:00
Hansung Kim
dd90736382
Reformat perfcount report
2024-03-23 01:07:46 -07:00
Hansung Kim
3e6a9a6104
Expose scoreboard fires to perf interface
2024-03-23 01:06:40 -07:00
Hansung Kim
d99295793c
Periodically report perf counter; reformat operand/FU stalls
2024-03-23 00:02:02 -07:00
Hansung Kim
83e151a189
Add valid / fire / cycles-issued perf counters to dispatch
2024-03-23 00:01:15 -07:00