Commit Graph

92 Commits

Author SHA1 Message Date
Hansung Kim
93a00101ae sgemm_wg: revert to faster params 2024-04-04 21:06:14 -07:00
Hansung Kim
fa2b6e2ad0 sgemm_wg: Explicitly limit unroll to reduce stack spilling
This needs to be done case-by-case for different BK/TM/TN combinations and
examining the assembly.
2024-03-29 02:48:29 -07:00
Hansung Kim
537b97eb20 common.mk: Don't clean all *.elf 2024-03-28 20:17:26 -07:00
Hansung Kim
a9b0814211 sgemm_wg: Document tiling parameter constraints 2024-03-28 18:17:00 -07:00
Hansung Kim
9673db4e8c sgemm_wg: Fix possible divide-by-0 2024-03-28 17:35:47 -07:00
Hansung Kim
9555b790e7 sgemm_wg: ifdef-guard cluster specific code 2024-03-27 22:45:51 -07:00
Hansung Kim
09822764e7 sgemm_wg: Remove software-based barrier implementation
Intra-cluster barrier is now implemented in hardware, transparent to the ISA.
2024-03-27 22:43:45 -07:00
Hansung Kim
fa6adceb7e vecaddx: Hardcode args/input device address to match chipyard
Don't use mem_alloc/mem_free API
2024-03-27 15:15:52 -07:00
Hansung Kim
b545809496 vecaddx: Use -DRADIANCE 2024-03-26 16:42:36 -07:00
Hansung Kim
4d2c0084d1 common.mk: Compile separate cluster ELF
... using -DRADIANCE, which the kernel C code use explicitly to switch between
vx_spawn_tasks and vx_spawn_tasks_cluster.  This is to ease running both simX
and Chipyard simulations without mixing up binaries.
2024-03-26 16:37:44 -07:00
Hansung Kim
7f00e6c376 vecaddx: Change arg device address to 7fff0000 2024-03-26 10:44:33 -07:00
Hansung Kim
cc7b34ec5b vecaddx: Write args.bin and input.bin 2024-03-26 10:44:02 -07:00
Hansung Kim
8f3474b151 Don't clean *.bin 2024-03-24 01:45:08 -07:00
Hansung Kim
2036d37840 sgemm_wg: Prevent run-ahead using ternary flags; reduce mem accesses 2024-03-13 21:35:24 -07:00
Hansung Kim
510a834db5 sgemm_wg: Implement software barrier for inter-core synchronization 2024-03-12 15:34:42 -07:00
Hansung Kim
fbe872c831 sgemm_wg: Add missing makefile dep to common.h 2024-03-12 15:34:17 -07:00
Hansung Kim
6f4dfe5a0e sgemm_wg: Implement 2D threadtiling 2024-02-29 14:40:54 -08:00
Hansung Kim
a06b2dd20e sgemm_wg: Cleanup & proper unroll 2024-02-28 21:17:42 -08:00
Hansung Kim
46f242e520 sgemm_wg: Constantify BM/BN/BK/TM, computationally set gridsize and TB/core 2024-02-27 22:23:25 -08:00
Hansung Kim
27646bb507 sgemm_wg: Implement multiple C per thread with sliding A/B blocks 2024-02-27 22:06:01 -08:00
Hansung Kim
f1e7407d3a sgemm_wg: Run multiple threadblock per core 2024-02-27 15:44:04 -08:00
Hansung Kim
d2da0d3394 sgemm_wg: Parameterize threadblock dimensions 2024-02-17 18:05:59 -08:00
Hansung Kim
301f1ca260 sgemm_wg: Implement blocking over k-dimension 2024-02-16 16:20:57 -08:00
Hansung Kim
5f79e8a3f1 sgemm_wg: reference matmul in cpu 2024-02-12 22:29:38 -08:00
Hansung Kim
6b420aceb6 sgemm_wg: write simple C=A*A matmul 2024-02-12 22:22:28 -08:00
Hansung Kim
a43d5eb1a7 Merge remote-tracking branch 'upstream/master' into kernels 2024-02-12 20:50:32 -08:00
Hansung Kim
6a1a506b64 sgemm_wg: save args and input bin 2024-02-12 20:49:08 -08:00
Hansung Kim
ad8bf9b223 Add sgemm_wg C kernel 2024-02-07 21:31:08 -08:00
Blaise Tine
9dc5793046 minor udpate 2023-11-27 02:21:47 -08:00
Blaise Tine
2f1171ca76 minor update 2023-11-27 02:04:22 -08:00
Blaise Tine
61e3442ef8 adding opencl convolution benchmark 2023-11-14 22:31:30 -08:00
Blaise Tine
4e7a536918 adding tensor regression test. 2023-11-14 05:37:46 -08:00
Blaise Tine
62cdd8e993 minor update 2023-11-11 15:49:39 -08:00
Blaise Tine
c1e168fdbe Vortex 2.0 changes:
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes

minor update

minor update

minor update

minor update

minor update

minor update

cleanup

cleanup

cache bindings and memory perf refactory

minor update

minor update

hw unit tests fixes

minor update

minor update

minor update

minor update

minor update

minor udpate

minor update

minor update

minor update

minor update

minor update

minor update

minor update

minor updates

minor updates

minor update

minor update

minor update

minor update

minor update

minor update

minor updates

minor updates

minor updates

minor updates

minor update

minor update
2023-11-10 02:47:05 -08:00
Blaise Tine
d47cccc157 Vortex 2.0 changes:
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
2023-10-19 20:51:22 -07:00
Blaise Tine
b9cda8fca7 minor update 2023-05-15 20:19:14 -04:00
Blaise Tine
e1b666cb93 minor update 2022-07-14 08:55:09 -04:00
Blaise Tine
2277e3c878 minor update 2022-02-05 17:59:58 -05:00
Santosh Srivatsan
b7e5a83ba3 Merged branch xlen-parameterization into staging 2022-02-05 13:47:42 -05:00
Blaise Tine
cf2a0a5f39 code refactoring 2022-02-04 00:07:24 -05:00
Blaise Tine
a06812f93f minor updates 2022-02-01 22:51:33 -05:00
Blaise Tine
d48f1c1c5f minor updates 2022-02-01 06:53:31 -05:00
Blaise Tine
3750c672a7 Makefiles update 2022-01-30 00:26:55 -05:00
Blaise Tine
f7887d8720 refactoring device memory allocation and cleanup 2022-01-28 21:57:16 -05:00
Santosh Srivatsan
427146d59b Removed 64-bit runtime and regression tests 2021-12-11 17:20:40 -05:00
Santosh Srivatsan
5edb9098ce Merge branch 'simx64' 2021-12-10 21:48:29 -05:00
Blaise Tine
0e2de4f13a prefetch test fixes 2021-12-09 04:54:10 -05:00
Blaise Tine
fb6106267c Merge branch 'master' of https://github.com/vortexgpgpu/vortex 2021-12-09 00:00:28 -05:00
Blaise Tine
a9ec1c08a7 minor update 2021-12-06 15:44:25 -05:00
Blaise Tine
41d7e6c63a cummulative fixes, RTL uuid trace, texture unit fixes, simx timing fixes 2021-11-30 07:08:15 -05:00