Hansung Kim
0f06afc3ef
Update doc
2024-10-21 22:37:20 -07:00
Hansung Kim
da54162241
tensor: Add FP16 parameter and expose to VX_core
2024-09-10 15:32:17 -07:00
Hansung Kim
a968bdd69b
tensor: Fix HALF_PRECISION to 1
2024-09-08 01:43:21 -07:00
Hansung Kim
2b1a9b7c16
tensor: Rename & docs
2024-08-23 16:21:45 -07:00
Hansung Kim
45f6ae5aad
tensor: Doc comments
2024-08-20 14:46:40 -07:00
Hansung Kim
20faf87b80
tensor: Rename halves_buf to reduce confusion
2024-08-19 16:42:02 -07:00
Hansung Kim
d39e24643d
tensor: Parameterize fedp for fp16/fp32
2024-08-12 20:01:56 -07:00
Hansung Kim
15e93e01d8
tensor: Split packed fp16 and wire correctly to DPU
2024-08-07 11:16:38 -07:00
Hansung Kim
d4d18c2823
tensor: spurious assert, doc, remove unused param
2024-07-29 16:06:55 -07:00
Hansung Kim
4e0dcdadac
tensor: Share B operand buffer between threadgroups
...
The two threadgroups use the same B fragment, so no need to duplicately
store them in the operand buffer. To do this, pull the operand buffer
out of the threadgroups to the octet-level.
2024-07-27 20:42:08 -07:00
Hansung Kim
7ad3f64528
tensor: Remove old ready_reg DPI code
2024-07-27 17:36:02 -07:00
Hansung Kim
01f6024a76
tensor: Split flops into structural module
...
to get separate area/power numbers in hierarchical
2024-07-26 16:26:48 -07:00
Hansung Kim
7f43bab0aa
tensor: Parameterize result buffer depth
2024-07-25 16:31:45 -07:00
Hansung Kim
12f8722dd5
Shush display
2024-06-03 13:04:09 -07:00
Hansung Kim
0ebbb8e223
tensor: Fix perf counter; comment out dpi
2024-05-31 00:32:32 -07:00
Hansung Kim
574cc0e5f0
tensor: Document configuring queue depths
2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f
tensor: Fix sync for dpu warp queue as well
2024-05-30 18:22:36 -07:00
Hansung Kim
0a032ab400
tensor: Fix out-of-sync enqueue to dpu and metadata queue
2024-05-30 18:03:04 -07:00
Hansung Kim
2743d32bd2
tensor: Handle wid queue backpressure in dpu
2024-05-30 15:25:00 -07:00
Hansung Kim
2e2decc8b6
Shrink size of D_half latch
2024-05-30 12:46:45 -07:00
Hansung Kim
73a2f5781e
Do two-cycle compute with 1 FEDP per lane
2024-05-30 12:41:41 -07:00
Hansung Kim
5ed6041e33
tensor: Properly stall dpu upon commit backpressure
...
& better-reasoned queue depths
2024-05-29 17:05:53 -07:00
Hansung Kim
f5a9ca5bf3
tensor: Enqueue both insts in pair to issue queue
...
Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one. Might introduce more frontend stalls, need more
experimenting
2024-05-29 14:47:25 -07:00
Hansung Kim
e9df173745
tensor: Use chisel-generated dpu module
2024-05-29 13:34:25 -07:00
Hansung Kim
c03a5b070c
tensor: Issue queue for dpu to improve utilization
2024-05-27 18:25:10 -07:00
Hansung Kim
28f6cd59b5
tensor: Improve commit efficiency by decoupling dpu with fifo
2024-05-26 22:00:25 -07:00
Hansung Kim
864265bda5
tensor: Fix consecutive commits to write to same warp
...
... by splitting the pending_uops queue across warps.
2024-05-25 20:04:31 -07:00
Hansung Kim
5034d8d14b
tensor: Add buffer to hide 2cyc commit latency
...
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
89e7d65926
tensor: Add ready signal to enforce 1 warp occupancy
...
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Richard Yan
d624b3e50a
store fencing, large smem, fix tensor core for firesim
2024-05-15 21:45:48 -07:00
Richard Yan
4aad161739
Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl
2024-05-07 14:00:31 -07:00
Richard Yan
37616f3334
firesim modifications
2024-05-07 13:59:25 -07:00
Hansung Kim
9ea291eea2
Merge remote-tracking branch 'origin/tensor_core' into rtl
2024-05-05 17:03:57 -07:00
joshua
5bd25985c6
i kinda forgot most of changes
2024-05-04 23:01:47 -07:00
Hansung Kim
675e8ea130
Merge branch 'tensor_core' into rtl
2024-05-01 16:18:14 -07:00
Richard Yan
85213d2876
synthesizable design
2024-04-17 18:05:51 -07:00
joshua
b254281295
initial tcore impl
2024-03-21 01:29:38 -07:00
joshua
f9b4509936
initial tensor core
2024-03-20 02:46:00 -07:00
joshua
978dd3bdfe
seemingly working fp32 implementation
2024-03-19 17:56:59 -07:00
Hansung Kim
48558982f7
Merge remote-tracking branch 'upstream/master' into vortex2
2024-02-01 23:35:58 -08:00
Blaise Tine
38b92ad592
- using SV_DPI defines to disable DPI in synthesis-based simulations
...
- fixed Intel ASE run script: run_ase.sh
2024-01-28 00:22:21 -08:00
Blaise Tine
e04e026a14
profiling update
...
minor updates
2023-12-18 04:43:44 -08:00
Hansung Kim
7e0b63a3b3
Change result type for dpi calls from wire -> reg
...
VCS requires the output of the dpi calls to be of a type that can come
at the LHS of a procedural assignment, i.e. reg type. Seems to be a
different requirement from Verilator.
2023-11-15 19:26:12 -08:00
Blaise Tine
ecf546bc4a
minor update
2023-11-13 20:00:39 -08:00
Blaise Tine
a08d3ebd42
minor update
2023-11-12 23:40:59 -08:00
Blaise Tine
c1e168fdbe
Vortex 2.0 changes:
...
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
minor update
minor update
minor update
minor update
minor update
minor update
cleanup
cleanup
cache bindings and memory perf refactory
minor update
minor update
hw unit tests fixes
minor update
minor update
minor update
minor update
minor update
minor udpate
minor update
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor updates
minor updates
minor update
minor update
2023-11-10 02:47:05 -08:00
Blaise Tine
d47cccc157
Vortex 2.0 changes:
...
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
2023-10-19 20:51:22 -07:00