Commit Graph

47 Commits

Author SHA1 Message Date
Hansung Kim
0f06afc3ef Update doc 2024-10-21 22:37:20 -07:00
Hansung Kim
da54162241 tensor: Add FP16 parameter and expose to VX_core 2024-09-10 15:32:17 -07:00
Hansung Kim
a968bdd69b tensor: Fix HALF_PRECISION to 1 2024-09-08 01:43:21 -07:00
Hansung Kim
2b1a9b7c16 tensor: Rename & docs 2024-08-23 16:21:45 -07:00
Hansung Kim
45f6ae5aad tensor: Doc comments 2024-08-20 14:46:40 -07:00
Hansung Kim
20faf87b80 tensor: Rename halves_buf to reduce confusion 2024-08-19 16:42:02 -07:00
Hansung Kim
d39e24643d tensor: Parameterize fedp for fp16/fp32 2024-08-12 20:01:56 -07:00
Hansung Kim
15e93e01d8 tensor: Split packed fp16 and wire correctly to DPU 2024-08-07 11:16:38 -07:00
Hansung Kim
d4d18c2823 tensor: spurious assert, doc, remove unused param 2024-07-29 16:06:55 -07:00
Hansung Kim
4e0dcdadac tensor: Share B operand buffer between threadgroups
The two threadgroups use the same B fragment, so no need to duplicately
store them in the operand buffer.  To do this, pull the operand buffer
out of the threadgroups to the octet-level.
2024-07-27 20:42:08 -07:00
Hansung Kim
7ad3f64528 tensor: Remove old ready_reg DPI code 2024-07-27 17:36:02 -07:00
Hansung Kim
01f6024a76 tensor: Split flops into structural module
to get separate area/power numbers in hierarchical
2024-07-26 16:26:48 -07:00
Hansung Kim
7f43bab0aa tensor: Parameterize result buffer depth 2024-07-25 16:31:45 -07:00
Hansung Kim
12f8722dd5 Shush display 2024-06-03 13:04:09 -07:00
Hansung Kim
0ebbb8e223 tensor: Fix perf counter; comment out dpi 2024-05-31 00:32:32 -07:00
Hansung Kim
574cc0e5f0 tensor: Document configuring queue depths 2024-05-30 18:33:15 -07:00
Hansung Kim
83f9f6d84f tensor: Fix sync for dpu warp queue as well 2024-05-30 18:22:36 -07:00
Hansung Kim
0a032ab400 tensor: Fix out-of-sync enqueue to dpu and metadata queue 2024-05-30 18:03:04 -07:00
Hansung Kim
2743d32bd2 tensor: Handle wid queue backpressure in dpu 2024-05-30 15:25:00 -07:00
Hansung Kim
2e2decc8b6 Shrink size of D_half latch 2024-05-30 12:46:45 -07:00
Hansung Kim
73a2f5781e Do two-cycle compute with 1 FEDP per lane 2024-05-30 12:41:41 -07:00
Hansung Kim
5ed6041e33 tensor: Properly stall dpu upon commit backpressure
& better-reasoned queue depths
2024-05-29 17:05:53 -07:00
Hansung Kim
f5a9ca5bf3 tensor: Enqueue both insts in pair to issue queue
Otherwise the first-in-pair instructions can run ahead, latching their
inputs for the next pair before the second-in-pair insts finish compute
on the current one.  Might introduce more frontend stalls, need more
experimenting
2024-05-29 14:47:25 -07:00
Hansung Kim
e9df173745 tensor: Use chisel-generated dpu module 2024-05-29 13:34:25 -07:00
Hansung Kim
c03a5b070c tensor: Issue queue for dpu to improve utilization 2024-05-27 18:25:10 -07:00
Hansung Kim
28f6cd59b5 tensor: Improve commit efficiency by decoupling dpu with fifo 2024-05-26 22:00:25 -07:00
Hansung Kim
864265bda5 tensor: Fix consecutive commits to write to same warp
... by splitting the pending_uops queue across warps.
2024-05-25 20:04:31 -07:00
Hansung Kim
5034d8d14b tensor: Add buffer to hide 2cyc commit latency
Since operand and commit throughput are the same (2 cycles), it is
unnecessary to stall the dpu during the multi-cycle commit.
This enables the dpu to operate at full throughput of 1 operand every 2
cycles.
2024-05-16 20:09:08 -07:00
Hansung Kim
89e7d65926 tensor: Add ready signal to enforce 1 warp occupancy
Currently disabled as the timing behavior is already ~accurate
2024-05-16 15:34:54 -07:00
Richard Yan
d624b3e50a store fencing, large smem, fix tensor core for firesim 2024-05-15 21:45:48 -07:00
Richard Yan
4aad161739 Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl 2024-05-07 14:00:31 -07:00
Richard Yan
37616f3334 firesim modifications 2024-05-07 13:59:25 -07:00
Hansung Kim
9ea291eea2 Merge remote-tracking branch 'origin/tensor_core' into rtl 2024-05-05 17:03:57 -07:00
joshua
5bd25985c6 i kinda forgot most of changes 2024-05-04 23:01:47 -07:00
Hansung Kim
675e8ea130 Merge branch 'tensor_core' into rtl 2024-05-01 16:18:14 -07:00
Richard Yan
85213d2876 synthesizable design 2024-04-17 18:05:51 -07:00
joshua
b254281295 initial tcore impl 2024-03-21 01:29:38 -07:00
joshua
f9b4509936 initial tensor core 2024-03-20 02:46:00 -07:00
joshua
978dd3bdfe seemingly working fp32 implementation 2024-03-19 17:56:59 -07:00
Hansung Kim
48558982f7 Merge remote-tracking branch 'upstream/master' into vortex2 2024-02-01 23:35:58 -08:00
Blaise Tine
38b92ad592 - using SV_DPI defines to disable DPI in synthesis-based simulations
- fixed Intel ASE run script: run_ase.sh
2024-01-28 00:22:21 -08:00
Blaise Tine
e04e026a14 profiling update
minor updates
2023-12-18 04:43:44 -08:00
Hansung Kim
7e0b63a3b3 Change result type for dpi calls from wire -> reg
VCS requires the output of the dpi calls to be of a type that can come
at the LHS of a procedural assignment, i.e. reg type.  Seems to be a
different requirement from Verilator.
2023-11-15 19:26:12 -08:00
Blaise Tine
ecf546bc4a minor update 2023-11-13 20:00:39 -08:00
Blaise Tine
a08d3ebd42 minor update 2023-11-12 23:40:59 -08:00
Blaise Tine
c1e168fdbe Vortex 2.0 changes:
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes

minor update

minor update

minor update

minor update

minor update

minor update

cleanup

cleanup

cache bindings and memory perf refactory

minor update

minor update

hw unit tests fixes

minor update

minor update

minor update

minor update

minor update

minor udpate

minor update

minor update

minor update

minor update

minor update

minor update

minor update

minor updates

minor updates

minor update

minor update

minor update

minor update

minor update

minor update

minor updates

minor updates

minor updates

minor updates

minor update

minor update
2023-11-10 02:47:05 -08:00
Blaise Tine
d47cccc157 Vortex 2.0 changes:
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
2023-10-19 20:51:22 -07:00