Commit Graph

90 Commits

Author SHA1 Message Date
Hansung Kim
45d86b26a2 tensor: Add counter for dpu operations 2024-05-16 22:15:01 -07:00
Hansung Kim
5034d8d14b tensor: Add buffer to hide 2cyc commit latency
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
317695a8d0 Add perf counters on LSU resp valid tmasks 2024-05-16 15:34:54 -07:00
Hansung Kim
89e7d65926 tensor: Add ready signal to enforce 1 warp occupancy
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Hansung Kim
1a1094b2bb tensor: Add dispatch unit to narrow to BLOCK_SIZE=1 2024-05-16 15:34:54 -07:00
Hansung Kim
9f9ec10960 tensor: Enable scaling NUM_THREADS by octets
todo: lane-to-octet mapping is arbitrary atm
2024-05-16 15:34:50 -07:00
Richard Yan
d624b3e50a store fencing, large smem, fix tensor core for firesim 2024-05-15 21:45:48 -07:00
Richard Yan
4aad161739 Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-05-07 14:00:31 -07:00
Richard Yan
c9a3eaad79 accelerator cisc 2024-05-07 13:58:32 -07:00
Richard Yan
14d1552f08 potential deadlock 2024-05-07 13:56:51 -07:00
Hansung Kim
868bbdb15e tensor: more doc 2024-05-07 13:54:10 -07:00
Hansung Kim
fb626ee21c tensor: doc 2024-05-05 18:35:52 -07:00
Hansung Kim
9ea291eea2 Merge remote-tracking branch 'origin/tensor_core' into rtl 2024-05-05 17:03:57 -07:00
joshua
5bd25985c6 i kinda forgot most of changes 2024-05-04 23:01:47 -07:00
Hansung Kim
1c7acab160 tensor: Fix lint errors 2024-05-03 15:43:02 -07:00
Hansung Kim
5a0ee98a61 Remove duplicate port connection 2024-05-03 15:07:24 -07:00
Hansung Kim
c4d71bc3d6 tensor: Fix multiple driver error on VCS 2024-05-01 21:40:48 -07:00
Hansung Kim
7fc5b6a374 tensor: Fix elaboration error on VCS 2024-05-01 21:40:45 -07:00
Hansung Kim
675e8ea130 Merge branch 'tensor_core' into rtl 2024-05-01 16:18:14 -07:00
Hansung Kim
9a688a05b1 Add (unconnected) FPU perf counters
mainly for debugging
2024-04-29 15:20:55 -07:00
Richard Yan
85213d2876 synthesizable design 2024-04-17 18:05:51 -07:00
Richard Yan
17fd29c114 Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-04-16 23:03:04 -07:00
Richard Yan
8de5470da4 round robin warp scheduling 2024-04-16 23:03:00 -07:00
Hansung Kim
217bc189da ifdef-guard VX_operand* to enable including both in Chisel 2024-04-15 22:06:47 -07:00
Hansung Kim
978b1fe2d0 Add operands stage with duplicated RF for rs1/2/3 2024-04-15 16:45:59 -07:00
Hansung Kim
87b966a5fa Add perf counter for stall by any operand hazard 2024-04-15 01:01:26 -07:00
Hansung Kim
6c632200d5 Divide by per-breakdown cycle for avg stall cycles 2024-04-03 15:29:51 -07:00
Hansung Kim
62c7d1f4cf Report any fire cycles from scoreboard as well 2024-03-29 12:23:15 -07:00
Hansung Kim
50263a5f7d Rename sched_barrier_stalls -> perf_sched_barrier_idles
Sched stall by barrier is really idle because it causes !scheduler_if.valid,
which is counted as part of sched_idle.
2024-03-28 22:45:12 -07:00
joshua
08d7721e11 annoying swizzling problems 2024-03-28 03:00:15 -07:00
joshua
e16584ddd9 bleh still not work 2024-03-27 00:26:04 -07:00
Hansung Kim
dd90736382 Reformat perfcount report 2024-03-23 01:07:46 -07:00
Hansung Kim
3e6a9a6104 Expose scoreboard fires to perf interface 2024-03-23 01:06:40 -07:00
Hansung Kim
d99295793c Periodically report perf counter; reformat operand/FU stalls 2024-03-23 00:02:02 -07:00
Hansung Kim
83e151a189 Add valid / fire / cycles-issued perf counters to dispatch 2024-03-23 00:01:15 -07:00
Hansung Kim
573be030c8 Add issue-stall-by-operand-hazard perf counters
Do the same reduce by + instead of OR fix for scoreboard counters.
2024-03-23 00:00:08 -07:00
Hansung Kim
dda67da84c Add issue-stall-by-unit-busy perf counters
Add per-issue-width counters instead of using reduce "OR" and causing
undercounting.
2024-03-21 18:11:12 -07:00
Hansung Kim
3718a57937 Docs 2024-03-21 15:44:50 -07:00
joshua
b254281295 initial tcore impl 2024-03-21 01:29:38 -07:00
Hansung Kim
9438862389 Add perf counter for barrier schedule stalls 2024-03-20 15:29:28 -07:00
joshua
f9b4509936 initial tensor core 2024-03-20 02:46:00 -07:00
Hansung Kim
7014ae24da Prettier perf count reports 2024-03-19 15:25:46 -07:00
Hansung Kim
b25deb8a2e Fix assignment for perf counters 2024-03-19 14:06:44 -07:00
Hansung Kim
df4b21507e Customize global barrier response logic for clusters 2024-03-18 14:30:32 -07:00
Hansung Kim
2525df9c5f Use GBAR_CLUSTER_ENABLE to guard cluster-specific modification 2024-03-17 18:24:04 -07:00
Hansung Kim
28f54bde7f Merge remote-tracking branch 'sungwoong/master' into rtl 2024-03-14 09:15:59 -07:00
Hansung Kim
bd67ff3439 Fix creating bogus mem reqs when commit is stalled
When commit stage is stalled, LSU ready is deasserted for mem writes
since stores commit immediately; however, the same was not applied to
valid, creating duplicate memory write requests.  Fix by guarding both
ready and valid properly.
2024-03-13 20:43:27 -07:00
Hansung Kim
8317a3fbe5 Fix fence by disallowing x-initialization instead of all-0 mask
Setting mem_req_mask to all-zero triggers an assertion error in
mem_scheduler.  Instead, disallow initialize-by-x in instruction decode
which is the source of x-propagation.  Since this seems to only happen
in VCS, define-gate it accordingly.

This reverts commit a15f4fd483.
2024-03-07 17:39:18 -08:00
Hansung Kim
b63333a4ec Merge remote-tracking branch 'upstream/master' into vortex2 2024-03-07 14:45:48 -08:00
joshua
beb3dce46d integer reduction unit 2024-03-06 01:39:17 -08:00