Hansung Kim
0f06afc3ef
Update doc
2024-10-21 22:37:20 -07:00
Hansung Kim
da54162241
tensor: Add FP16 parameter and expose to VX_core
2024-09-10 15:32:17 -07:00
Hansung Kim
a968bdd69b
tensor: Fix HALF_PRECISION to 1
2024-09-08 01:43:21 -07:00
Hansung Kim
2b1a9b7c16
tensor: Rename & docs
2024-08-23 16:21:45 -07:00
Hansung Kim
45f6ae5aad
tensor: Doc comments
2024-08-20 14:46:40 -07:00
Hansung Kim
20faf87b80
tensor: Rename halves_buf to reduce confusion
2024-08-19 16:42:02 -07:00
Hansung Kim
d39e24643d
tensor: Parameterize fedp for fp16/fp32
2024-08-12 20:01:56 -07:00
Hansung Kim
15e93e01d8
tensor: Split packed fp16 and wire correctly to DPU
2024-08-07 11:16:38 -07:00
Hansung Kim
d4d18c2823
tensor: spurious assert, doc, remove unused param
2024-07-29 16:06:55 -07:00
Hansung Kim
4e0dcdadac
tensor: Share B operand buffer between threadgroups
...
The two threadgroups use the same B fragment, so no need to duplicately
store them in the operand buffer. To do this, pull the operand buffer
out of the threadgroups to the octet-level.
2024-07-27 20:42:08 -07:00
Hansung Kim
7ad3f64528
tensor: Remove old ready_reg DPI code
2024-07-27 17:36:02 -07:00
Hansung Kim
01f6024a76
tensor: Split flops into structural module
...
to get separate area/power numbers in hierarchical
2024-07-26 16:26:48 -07:00
Hansung Kim
7f43bab0aa
tensor: Parameterize result buffer depth
2024-07-25 16:31:45 -07:00
Hansung Kim
12f8722dd5
Shush display
2024-06-03 13:04:09 -07:00
Hansung Kim
0ebbb8e223
tensor: Fix perf counter; comment out dpi
2024-05-31 00:32:32 -07:00
Hansung Kim
574cc0e5f0
tensor: Document configuring queue depths
2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f
tensor: Fix sync for dpu warp queue as well
2024-05-30 18:22:36 -07:00
Hansung Kim
0a032ab400
tensor: Fix out-of-sync enqueue to dpu and metadata queue
2024-05-30 18:03:04 -07:00
Hansung Kim
2743d32bd2
tensor: Handle wid queue backpressure in dpu
2024-05-30 15:25:00 -07:00
Hansung Kim
2e2decc8b6
Shrink size of D_half latch
2024-05-30 12:46:45 -07:00
Hansung Kim
73a2f5781e
Do two-cycle compute with 1 FEDP per lane
2024-05-30 12:41:41 -07:00
Hansung Kim
5ed6041e33
tensor: Properly stall dpu upon commit backpressure
...
& better-reasoned queue depths
2024-05-29 17:05:53 -07:00
Hansung Kim
f5a9ca5bf3
tensor: Enqueue both insts in pair to issue queue
...
Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one. Might introduce more frontend stalls, need more
experimenting
2024-05-29 14:47:25 -07:00
Hansung Kim
e9df173745
tensor: Use chisel-generated dpu module
2024-05-29 13:34:25 -07:00
Hansung Kim
c03a5b070c
tensor: Issue queue for dpu to improve utilization
2024-05-27 18:25:10 -07:00
Hansung Kim
28f6cd59b5
tensor: Improve commit efficiency by decoupling dpu with fifo
2024-05-26 22:00:25 -07:00
Hansung Kim
864265bda5
tensor: Fix consecutive commits to write to same warp
...
... by splitting the pending_uops queue across warps.
2024-05-25 20:04:31 -07:00
Hansung Kim
5034d8d14b
tensor: Add buffer to hide 2cyc commit latency
...
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
89e7d65926
tensor: Add ready signal to enforce 1 warp occupancy
...
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Richard Yan
d624b3e50a
store fencing, large smem, fix tensor core for firesim
2024-05-15 21:45:48 -07:00
joshua
5bd25985c6
i kinda forgot most of changes
2024-05-04 23:01:47 -07:00
joshua
b254281295
initial tcore impl
2024-03-21 01:29:38 -07:00
joshua
978dd3bdfe
seemingly working fp32 implementation
2024-03-19 17:56:59 -07:00