Richard Yan
a44edf2b65
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
2024-04-24 22:10:40 -07:00
Richard Yan
6eafa2de54
write operands to elf
2024-04-24 22:09:30 -07:00
Hansung Kim
df881fd69f
Generate separate ELF for radiance
2024-04-24 21:10:21 -07:00
Hansung Kim
793779aa6c
sgemm_wg: 128x128 config
2024-04-24 21:10:21 -07:00
Hansung Kim
689043b45e
Add regression flops
2024-04-24 21:10:21 -07:00
Hansung Kim
6cbfbfb856
sgemm_wg: Output CPU data to binary
2024-04-24 21:10:21 -07:00
Richard Yan
4e9855dc33
highly unrolled a/b load
2024-04-16 22:19:30 -07:00
Richard Yan
449d99f0bb
dram gemm kernel
2024-04-16 17:15:22 -07:00
Richard Yan
99621a0df9
Merge branch 'kernels' of https://github.com/hansungk/vortex-private into kernels
2024-04-15 10:22:19 -07:00
Richard Yan
0bb7aeb45b
add gpu+gemmini gemm kernel
2024-04-15 10:13:37 -07:00
Hansung Kim
37a60b1141
sgemm_wg: Output C result to binary
2024-04-14 12:36:06 -07:00
Hansung Kim
3383b70732
sgemm_wg: Hardcode device address
2024-04-14 12:36:00 -07:00
Hansung Kim
93a00101ae
sgemm_wg: revert to faster params
2024-04-04 21:06:14 -07:00
Hansung Kim
fa2b6e2ad0
sgemm_wg: Explicitly limit unroll to reduce stack spilling
...
This needs to be done case-by-case for different BK/TM/TN combinations and
examining the assembly.
2024-03-29 02:48:29 -07:00
Hansung Kim
537b97eb20
common.mk: Don't clean all *.elf
2024-03-28 20:17:26 -07:00
Hansung Kim
a9b0814211
sgemm_wg: Document tiling parameter constraints
2024-03-28 18:17:00 -07:00
Hansung Kim
9673db4e8c
sgemm_wg: Fix possible divide-by-0
2024-03-28 17:35:47 -07:00
Hansung Kim
9555b790e7
sgemm_wg: ifdef-guard cluster specific code
2024-03-27 22:45:51 -07:00
Hansung Kim
09822764e7
sgemm_wg: Remove software-based barrier implementation
...
Intra-cluster barrier is now implemented in hardware, transparent to the ISA.
2024-03-27 22:43:45 -07:00
Hansung Kim
fa6adceb7e
vecaddx: Hardcode args/input device address to match chipyard
...
Don't use mem_alloc/mem_free API
2024-03-27 15:15:52 -07:00
Hansung Kim
b545809496
vecaddx: Use -DRADIANCE
2024-03-26 16:42:36 -07:00
Hansung Kim
4d2c0084d1
common.mk: Compile separate cluster ELF
...
... using -DRADIANCE, which the kernel C code use explicitly to switch between
vx_spawn_tasks and vx_spawn_tasks_cluster. This is to ease running both simX
and Chipyard simulations without mixing up binaries.
2024-03-26 16:37:44 -07:00
Hansung Kim
7f00e6c376
vecaddx: Change arg device address to 7fff0000
2024-03-26 10:44:33 -07:00
Hansung Kim
cc7b34ec5b
vecaddx: Write args.bin and input.bin
2024-03-26 10:44:02 -07:00
Hansung Kim
8f3474b151
Don't clean *.bin
2024-03-24 01:45:08 -07:00
Hansung Kim
2036d37840
sgemm_wg: Prevent run-ahead using ternary flags; reduce mem accesses
2024-03-13 21:35:24 -07:00
Hansung Kim
510a834db5
sgemm_wg: Implement software barrier for inter-core synchronization
2024-03-12 15:34:42 -07:00
Hansung Kim
fbe872c831
sgemm_wg: Add missing makefile dep to common.h
2024-03-12 15:34:17 -07:00
Hansung Kim
6f4dfe5a0e
sgemm_wg: Implement 2D threadtiling
2024-02-29 14:40:54 -08:00
Hansung Kim
a06b2dd20e
sgemm_wg: Cleanup & proper unroll
2024-02-28 21:17:42 -08:00
Hansung Kim
46f242e520
sgemm_wg: Constantify BM/BN/BK/TM, computationally set gridsize and TB/core
2024-02-27 22:23:25 -08:00
Hansung Kim
27646bb507
sgemm_wg: Implement multiple C per thread with sliding A/B blocks
2024-02-27 22:06:01 -08:00
Hansung Kim
f1e7407d3a
sgemm_wg: Run multiple threadblock per core
2024-02-27 15:44:04 -08:00
Hansung Kim
d2da0d3394
sgemm_wg: Parameterize threadblock dimensions
2024-02-17 18:05:59 -08:00
Hansung Kim
301f1ca260
sgemm_wg: Implement blocking over k-dimension
2024-02-16 16:20:57 -08:00
Hansung Kim
5f79e8a3f1
sgemm_wg: reference matmul in cpu
2024-02-12 22:29:38 -08:00
Hansung Kim
6b420aceb6
sgemm_wg: write simple C=A*A matmul
2024-02-12 22:22:28 -08:00
Hansung Kim
a43d5eb1a7
Merge remote-tracking branch 'upstream/master' into kernels
2024-02-12 20:50:32 -08:00
Hansung Kim
6a1a506b64
sgemm_wg: save args and input bin
2024-02-12 20:49:08 -08:00
Hansung Kim
ad8bf9b223
Add sgemm_wg C kernel
2024-02-07 21:31:08 -08:00
Blaise Tine
9dc5793046
minor udpate
2023-11-27 02:21:47 -08:00
Blaise Tine
2f1171ca76
minor update
2023-11-27 02:04:22 -08:00
Blaise Tine
61e3442ef8
adding opencl convolution benchmark
2023-11-14 22:31:30 -08:00
Blaise Tine
4e7a536918
adding tensor regression test.
2023-11-14 05:37:46 -08:00
Blaise Tine
62cdd8e993
minor update
2023-11-11 15:49:39 -08:00
Blaise Tine
c1e168fdbe
Vortex 2.0 changes:
...
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
minor update
minor update
minor update
minor update
minor update
minor update
cleanup
cleanup
cache bindings and memory perf refactory
minor update
minor update
hw unit tests fixes
minor update
minor update
minor update
minor update
minor update
minor udpate
minor update
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor updates
minor updates
minor update
minor update
2023-11-10 02:47:05 -08:00
Blaise Tine
d47cccc157
Vortex 2.0 changes:
...
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
2023-10-19 20:51:22 -07:00
Blaise Tine
b9cda8fca7
minor update
2023-05-15 20:19:14 -04:00
Blaise Tine
e1b666cb93
minor update
2022-07-14 08:55:09 -04:00
Blaise Tine
2277e3c878
minor update
2022-02-05 17:59:58 -05:00