vortex

Author	SHA1	Message	Date
Hansung Kim	0f06afc3ef	Update doc	2024-10-21 22:37:20 -07:00
Hansung Kim	da54162241	tensor: Add FP16 parameter and expose to VX_core	2024-09-10 15:32:17 -07:00
Hansung Kim	a968bdd69b	tensor: Fix HALF_PRECISION to 1	2024-09-08 01:43:21 -07:00
Hansung Kim	2b1a9b7c16	tensor: Rename & docs	2024-08-23 16:21:45 -07:00
Hansung Kim	45f6ae5aad	tensor: Doc comments	2024-08-20 14:46:40 -07:00
Hansung Kim	20faf87b80	tensor: Rename halves_buf to reduce confusion	2024-08-19 16:42:02 -07:00
Hansung Kim	d39e24643d	tensor: Parameterize fedp for fp16/fp32	2024-08-12 20:01:56 -07:00
Hansung Kim	15e93e01d8	tensor: Split packed fp16 and wire correctly to DPU	2024-08-07 11:16:38 -07:00
Hansung Kim	d4d18c2823	tensor: spurious assert, doc, remove unused param	2024-07-29 16:06:55 -07:00
Hansung Kim	4e0dcdadac	tensor: Share B operand buffer between threadgroups The two threadgroups use the same B fragment, so no need to duplicately store them in the operand buffer. To do this, pull the operand buffer out of the threadgroups to the octet-level.	2024-07-27 20:42:08 -07:00
Hansung Kim	7ad3f64528	tensor: Remove old ready_reg DPI code	2024-07-27 17:36:02 -07:00
Hansung Kim	01f6024a76	tensor: Split flops into structural module to get separate area/power numbers in hierarchical	2024-07-26 16:26:48 -07:00
Hansung Kim	7f43bab0aa	tensor: Parameterize result buffer depth	2024-07-25 16:31:45 -07:00
Hansung Kim	12f8722dd5	Shush display	2024-06-03 13:04:09 -07:00
Hansung Kim	0ebbb8e223	tensor: Fix perf counter; comment out dpi	2024-05-31 00:32:32 -07:00
Hansung Kim	574cc0e5f0	tensor: Document configuring queue depths	2024-05-30 18:33:15 -07:00
Hansung Kim	83f9f6d84f	tensor: Fix sync for dpu warp queue as well	2024-05-30 18:22:36 -07:00
Hansung Kim	0a032ab400	tensor: Fix out-of-sync enqueue to dpu and metadata queue	2024-05-30 18:03:04 -07:00
Hansung Kim	2743d32bd2	tensor: Handle wid queue backpressure in dpu	2024-05-30 15:25:00 -07:00
Hansung Kim	2e2decc8b6	Shrink size of D_half latch	2024-05-30 12:46:45 -07:00
Hansung Kim	73a2f5781e	Do two-cycle compute with 1 FEDP per lane	2024-05-30 12:41:41 -07:00
Hansung Kim	5ed6041e33	tensor: Properly stall dpu upon commit backpressure & better-reasoned queue depths	2024-05-29 17:05:53 -07:00
Hansung Kim	f5a9ca5bf3	tensor: Enqueue both insts in pair to issue queue Otherwise the first-in-pair instructions can run ahead, latching their inputs for the next pair before the second-in-pair insts finish compute on the current one. Might introduce more frontend stalls, need more experimenting	2024-05-29 14:47:25 -07:00
Hansung Kim	e9df173745	tensor: Use chisel-generated dpu module	2024-05-29 13:34:25 -07:00
Hansung Kim	c03a5b070c	tensor: Issue queue for dpu to improve utilization	2024-05-27 18:25:10 -07:00
Hansung Kim	28f6cd59b5	tensor: Improve commit efficiency by decoupling dpu with fifo	2024-05-26 22:00:25 -07:00
Hansung Kim	864265bda5	tensor: Fix consecutive commits to write to same warp ... by splitting the pending_uops queue across warps.	2024-05-25 20:04:31 -07:00
Hansung Kim	5034d8d14b	tensor: Add buffer to hide 2cyc commit latency Since operand and commit throughput are the same (2 cycles), it is unnecessary to stall the dpu during the multi-cycle commit. This enables the dpu to operate at full throughput of 1 operand every 2 cycles.	2024-05-16 20:09:08 -07:00
Hansung Kim	89e7d65926	tensor: Add ready signal to enforce 1 warp occupancy Currently disabled as the timing behavior is already ~accurate	2024-05-16 15:34:54 -07:00
Richard Yan	d624b3e50a	store fencing, large smem, fix tensor core for firesim	2024-05-15 21:45:48 -07:00
joshua	5bd25985c6	i kinda forgot most of changes	2024-05-04 23:01:47 -07:00
joshua	b254281295	initial tcore impl	2024-03-21 01:29:38 -07:00
joshua	978dd3bdfe	seemingly working fp32 implementation	2024-03-19 17:56:59 -07:00

33 Commits