Hansung Kim
83979c3341
tensor: Fully connect writeback IO
2024-10-22 20:17:00 -07:00
Hansung Kim
47dff74d3a
tensor: Fix commit/metadata logic for HGMMA
...
Block HGMMA commit until previous ones are all done; always commit
HGMMA_WAIT after it passes the scoreboard.
2024-10-22 20:01:37 -07:00
Hansung Kim
8a8f682194
tensor: Bore smem IO from core to tensor core
2024-10-22 17:42:30 -07:00
Hansung Kim
9131558950
tensor: Connect Chisel-generated TensorCoreDecoupled module
...
Elaborates, but most of the IOs are tied to fake.
2024-10-22 15:16:24 -07:00
Hansung Kim
32ccdeef01
Merge branch 'tensor-decoupled' into rtl
2024-10-21 22:57:07 -07:00
Hansung Kim
0f06afc3ef
Update doc
2024-10-21 22:37:20 -07:00
Richard Yan
cde8da1f3b
add tag to tc smem interface
2024-10-17 14:48:39 -07:00
Hansung Kim
4dcbc31a88
tensor: Separate async commit from tensor commit
...
With this we can prioritize commit of the async hgmma instructions over
the "ghost" commits from the TC.
2024-10-11 21:32:20 -07:00
Hansung Kim
717fe7ff29
tensor: Fix FSM when commit not ready
2024-10-11 20:24:31 -07:00
Hansung Kim
2934b1bd94
tensor: Split execution module from pipeline logic
2024-10-11 20:09:09 -07:00
Hansung Kim
f7f23e0c05
tensor: Doc update
2024-10-11 18:00:36 -07:00
Hansung Kim
42b9d23f83
tensor: Write release logic for hgmma
...
Upon completion of an op, tensor_core_hopper sends a "ghost" commit
signal down the pipeline with the `wb` and `tensor` bit set in
commit_if. The scoreboard receives this signal via writeback_if and
resets the inuse_tensor status bit back to zero, which unblocks the
HGMMA_WAIT instruction.
2024-10-11 17:58:44 -07:00
Hansung Kim
408a9b5d2a
tensor: Write stall logic for hgmma_wait
...
HGMMA_WAIT instruction stalls at issue when inuse_tensor is set, which
is done by the previous HGMMA insn. Currently inuse_tensor is never set
back to zero.
2024-10-11 17:18:01 -07:00
Hansung Kim
72f9dedce3
tensor: Disable micro-ops for hopper
...
Have an uarch FSM handle the stepping mechanism entirely.
2024-10-11 15:59:31 -07:00
Hansung Kim
100d69ef21
Doc update on accumulator regs
2024-10-11 15:47:58 -07:00
Hansung Kim
d9ad4809ec
Add 'tensor' bit to commit_if and writeback_if
...
For use in the asynchronous tensor instruction. When 1'b1, sets/unsets
the inuse_tensor status bit in the scoreboard to signal
kickoff/completion of the asynchronous tensor op.
2024-10-11 15:42:25 -07:00
Hansung Kim
58c9761829
Revert decode change for hopper
...
Share the same insn as non-hopper TC.
2024-10-09 21:53:04 -07:00
Hansung Kim
7ab14445f0
tensor: Test many-commit per execute with an FSM
...
Trick is to set commit_if.data.eop to 0, since the commit module only
signals instruction completion to VX_schedule if the eop bit is 1.
Otherwise it underflows the pending_instr buffer.
The same eop trick works for VX_scoreboard, which works around the
invalid rd writeback error.
2024-10-07 21:29:44 -07:00
Hansung Kim
e8ca4677df
Remove old code for pending_instr underflow fix
2024-10-07 20:21:35 -07:00
Hansung Kim
4cac1adf7d
Add dummy code for decoupled Hopper tensor core
...
Define EXT_T_HOPPER that, when EXT_T_ENABLE is defined, distinguishes
whether to instantiate core-coupled Volta-style or decoupled
Hopper-style Tensor Core.
2024-10-07 17:10:59 -07:00
Richard Yan
8bf7f39f04
add tensor core memory interface
2024-10-07 02:56:38 -07:00
Hansung Kim
da54162241
tensor: Add FP16 parameter and expose to VX_core
2024-09-10 15:32:17 -07:00
Richard Yan
3f8c28c7d6
sync rf, x0 fix
2024-09-05 16:49:05 -07:00
Hansung Kim
2b1a9b7c16
tensor: Rename & docs
2024-08-23 16:21:45 -07:00
Hansung Kim
45f6ae5aad
tensor: Doc comments
2024-08-20 14:46:40 -07:00
Hansung Kim
20faf87b80
tensor: Rename halves_buf to reduce confusion
2024-08-19 16:42:02 -07:00
Hansung Kim
789d873e19
Disable reduce_unit for timing optimization
...
Currently the critical path @1GHz is found at the accumulators inside
reduce_unit.
2024-08-16 15:28:56 -07:00
Hansung Kim
1410b39143
Disable trace during the very start of simulation
2024-08-13 16:01:29 -07:00
Hansung Kim
d4d18c2823
tensor: spurious assert, doc, remove unused param
2024-07-29 16:06:55 -07:00
Hansung Kim
4e0dcdadac
tensor: Share B operand buffer between threadgroups
...
The two threadgroups use the same B fragment, so no need to duplicately
store them in the operand buffer. To do this, pull the operand buffer
out of the threadgroups to the octet-level.
2024-07-27 20:42:08 -07:00
Hansung Kim
01f6024a76
tensor: Split flops into structural module
...
to get separate area/power numbers in hierarchical
2024-07-26 16:26:48 -07:00
Hansung Kim
7f43bab0aa
tensor: Parameterize result buffer depth
2024-07-25 16:31:45 -07:00
Hansung Kim
14b811f334
Update doc
2024-07-19 16:39:05 -07:00
Hansung Kim
4b093e3ff7
tensor: Mark PARTIAL_BW on power impact
2024-06-26 14:25:26 -07:00
Hansung Kim
9a6fe79bd3
VX_operands_dup: Add counter for RF read/write accesses
2024-06-22 16:35:23 -07:00
Hansung Kim
86deaa8e07
Give some slack time for other cores to finish
2024-06-12 09:47:21 -07:00
Richard Yan
7947df8a6c
config change, move ucode
2024-06-12 02:15:08 -07:00
Hansung Kim
874a3bf194
Doc changes
2024-06-09 13:41:00 -07:00
Hansung Kim
12f8722dd5
Shush display
2024-06-03 13:04:09 -07:00
Hansung Kim
9caafb2d8a
tensor: Decode rd of macro-op to designate additional accumulator
...
This is useful when you want to have the tensor core output to multiple
accumulator registers, e.g. when doing outer product within the RF.
2024-05-31 19:17:56 -07:00
Hansung Kim
0ebbb8e223
tensor: Fix perf counter; comment out dpi
2024-05-31 00:32:32 -07:00
Hansung Kim
73293061ea
tensor: Enlarge metadata queue
2024-05-30 23:21:23 -07:00
Hansung Kim
52bb827a46
Handle BLOCK_SIZE != 1 in dispatch_unit
...
+ change ALU and FPU unit to use it as well
2024-05-30 23:20:21 -07:00
Hansung Kim
a02773eb92
Add more efficient dispatch_unit
...
Instead of having a single candidate to be considered for dispatch
(designated by 'batch_idx' counter), add a dispatch_unit variant that
considerse all `ISSUE_WIDTH dispatch signals and picks a valid one in a
round-robin manner.
This increases core utilization significantly due to better overlapping
of smem/tensor ops.
2024-05-30 21:55:42 -07:00
Hansung Kim
574cc0e5f0
tensor: Document configuring queue depths
2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f
tensor: Fix sync for dpu warp queue as well
2024-05-30 18:22:36 -07:00
Hansung Kim
0a032ab400
tensor: Fix out-of-sync enqueue to dpu and metadata queue
2024-05-30 18:03:04 -07:00
Hansung Kim
97f37b1c75
tensor: Add commit stall injection for debugging
2024-05-30 18:00:26 -07:00
Hansung Kim
06e0f901ff
tensor: Handle backpressure from metadata queue
2024-05-30 17:34:49 -07:00
Hansung Kim
dfb2276657
tensor: Remove redundant issue queue outside pdu
2024-05-30 17:29:59 -07:00