vortex

Author	SHA1	Message	Date
Hansung Kim	0f06afc3ef	Update doc	2024-10-21 22:37:20 -07:00
Hansung Kim	da54162241	tensor: Add FP16 parameter and expose to VX_core	2024-09-10 15:32:17 -07:00
Hansung Kim	a968bdd69b	tensor: Fix HALF_PRECISION to 1	2024-09-08 01:43:21 -07:00
Hansung Kim	2b1a9b7c16	tensor: Rename & docs	2024-08-23 16:21:45 -07:00
Hansung Kim	45f6ae5aad	tensor: Doc comments	2024-08-20 14:46:40 -07:00
Hansung Kim	20faf87b80	tensor: Rename halves_buf to reduce confusion	2024-08-19 16:42:02 -07:00
Hansung Kim	d39e24643d	tensor: Parameterize fedp for fp16/fp32	2024-08-12 20:01:56 -07:00
Hansung Kim	15e93e01d8	tensor: Split packed fp16 and wire correctly to DPU	2024-08-07 11:16:38 -07:00
Hansung Kim	d4d18c2823	tensor: spurious assert, doc, remove unused param	2024-07-29 16:06:55 -07:00
Hansung Kim	4e0dcdadac	tensor: Share B operand buffer between threadgroups The two threadgroups use the same B fragment, so no need to duplicately store them in the operand buffer. To do this, pull the operand buffer out of the threadgroups to the octet-level.	2024-07-27 20:42:08 -07:00
Hansung Kim	7ad3f64528	tensor: Remove old ready_reg DPI code	2024-07-27 17:36:02 -07:00
Hansung Kim	01f6024a76	tensor: Split flops into structural module to get separate area/power numbers in hierarchical	2024-07-26 16:26:48 -07:00
Hansung Kim	7f43bab0aa	tensor: Parameterize result buffer depth	2024-07-25 16:31:45 -07:00
Hansung Kim	12f8722dd5	Shush display	2024-06-03 13:04:09 -07:00
Hansung Kim	0ebbb8e223	tensor: Fix perf counter; comment out dpi	2024-05-31 00:32:32 -07:00
Hansung Kim	574cc0e5f0	tensor: Document configuring queue depths	2024-05-30 18:33:15 -07:00
Hansung Kim	83f9f6d84f	tensor: Fix sync for dpu warp queue as well	2024-05-30 18:22:36 -07:00
Hansung Kim	0a032ab400	tensor: Fix out-of-sync enqueue to dpu and metadata queue	2024-05-30 18:03:04 -07:00
Hansung Kim	2743d32bd2	tensor: Handle wid queue backpressure in dpu	2024-05-30 15:25:00 -07:00
Hansung Kim	2e2decc8b6	Shrink size of D_half latch	2024-05-30 12:46:45 -07:00
Hansung Kim	73a2f5781e	Do two-cycle compute with 1 FEDP per lane	2024-05-30 12:41:41 -07:00
Hansung Kim	5ed6041e33	tensor: Properly stall dpu upon commit backpressure & better-reasoned queue depths	2024-05-29 17:05:53 -07:00
Hansung Kim	f5a9ca5bf3	tensor: Enqueue both insts in pair to issue queue Otherwise the first-in-pair instructions can run ahead, latching their inputs for the next pair before the second-in-pair insts finish compute on the current one. Might introduce more frontend stalls, need more experimenting	2024-05-29 14:47:25 -07:00
Hansung Kim	e9df173745	tensor: Use chisel-generated dpu module	2024-05-29 13:34:25 -07:00
Hansung Kim	c03a5b070c	tensor: Issue queue for dpu to improve utilization	2024-05-27 18:25:10 -07:00
Hansung Kim	28f6cd59b5	tensor: Improve commit efficiency by decoupling dpu with fifo	2024-05-26 22:00:25 -07:00
Hansung Kim	864265bda5	tensor: Fix consecutive commits to write to same warp ... by splitting the pending_uops queue across warps.	2024-05-25 20:04:31 -07:00
Hansung Kim	5034d8d14b	tensor: Add buffer to hide 2cyc commit latency Since operand and commit throughput are the same (2 cycles), it is unnecessary to stall the dpu during the multi-cycle commit. This enables the dpu to operate at full throughput of 1 operand every 2 cycles.	2024-05-16 20:09:08 -07:00
Hansung Kim	89e7d65926	tensor: Add ready signal to enforce 1 warp occupancy Currently disabled as the timing behavior is already ~accurate	2024-05-16 15:34:54 -07:00
Richard Yan	d624b3e50a	store fencing, large smem, fix tensor core for firesim	2024-05-15 21:45:48 -07:00
Richard Yan	4aad161739	Merge branch 'rtl' of https://github.com/hansungk/vortex-private into rtl	2024-05-07 14:00:31 -07:00
Richard Yan	37616f3334	firesim modifications	2024-05-07 13:59:25 -07:00
Hansung Kim	9ea291eea2	Merge remote-tracking branch 'origin/tensor_core' into rtl	2024-05-05 17:03:57 -07:00
joshua	5bd25985c6	i kinda forgot most of changes	2024-05-04 23:01:47 -07:00
Hansung Kim	675e8ea130	Merge branch 'tensor_core' into rtl	2024-05-01 16:18:14 -07:00
Richard Yan	85213d2876	synthesizable design	2024-04-17 18:05:51 -07:00
joshua	b254281295	initial tcore impl	2024-03-21 01:29:38 -07:00
joshua	f9b4509936	initial tensor core	2024-03-20 02:46:00 -07:00
joshua	978dd3bdfe	seemingly working fp32 implementation	2024-03-19 17:56:59 -07:00
Hansung Kim	48558982f7	Merge remote-tracking branch 'upstream/master' into vortex2	2024-02-01 23:35:58 -08:00
Blaise Tine	38b92ad592	- using SV_DPI defines to disable DPI in synthesis-based simulations - fixed Intel ASE run script: run_ase.sh	2024-01-28 00:22:21 -08:00
Blaise Tine	e04e026a14	profiling update minor updates	2023-12-18 04:43:44 -08:00
Hansung Kim	7e0b63a3b3	Change result type for dpi calls from wire -> reg VCS requires the output of the dpi calls to be of a type that can come at the LHS of a procedural assignment, i.e. reg type. Seems to be a different requirement from Verilator.	2023-11-15 19:26:12 -08:00
Blaise Tine	ecf546bc4a	minor update	2023-11-13 20:00:39 -08:00
Blaise Tine	a08d3ebd42	minor update	2023-11-12 23:40:59 -08:00
Blaise Tine	c1e168fdbe	Vortex 2.0 changes: + Microarchitecture optimizations + 64-bit support + Xilinx FPGA support + LLVM-16 support + Refactoring and quality control fixes minor update minor update minor update minor update minor update minor update cleanup cleanup cache bindings and memory perf refactory minor update minor update hw unit tests fixes minor update minor update minor update minor update minor update minor udpate minor update minor update minor update minor update minor update minor update minor update minor updates minor updates minor update minor update minor update minor update minor update minor update minor updates minor updates minor updates minor updates minor update minor update	2023-11-10 02:47:05 -08:00
Blaise Tine	d47cccc157	Vortex 2.0 changes: + Microarchitecture optimizations + 64-bit support + Xilinx FPGA support + LLVM-16 support + Refactoring and quality control fixes	2023-10-19 20:51:22 -07:00

47 Commits