Hansung Kim
93a00101ae
sgemm_wg: revert to faster params
2024-04-04 21:06:14 -07:00
Hansung Kim
fa2b6e2ad0
sgemm_wg: Explicitly limit unroll to reduce stack spilling
...
This needs to be done case-by-case for different BK/TM/TN combinations and
examining the assembly.
2024-03-29 02:48:29 -07:00
Hansung Kim
537b97eb20
common.mk: Don't clean all *.elf
2024-03-28 20:17:26 -07:00
Hansung Kim
a9b0814211
sgemm_wg: Document tiling parameter constraints
2024-03-28 18:17:00 -07:00
Hansung Kim
9673db4e8c
sgemm_wg: Fix possible divide-by-0
2024-03-28 17:35:47 -07:00
Hansung Kim
9555b790e7
sgemm_wg: ifdef-guard cluster specific code
2024-03-27 22:45:51 -07:00
Hansung Kim
09822764e7
sgemm_wg: Remove software-based barrier implementation
...
Intra-cluster barrier is now implemented in hardware, transparent to the ISA.
2024-03-27 22:43:45 -07:00
Hansung Kim
fa6adceb7e
vecaddx: Hardcode args/input device address to match chipyard
...
Don't use mem_alloc/mem_free API
2024-03-27 15:15:52 -07:00
Hansung Kim
b545809496
vecaddx: Use -DRADIANCE
2024-03-26 16:42:36 -07:00
Hansung Kim
4d2c0084d1
common.mk: Compile separate cluster ELF
...
... using -DRADIANCE, which the kernel C code use explicitly to switch between
vx_spawn_tasks and vx_spawn_tasks_cluster. This is to ease running both simX
and Chipyard simulations without mixing up binaries.
2024-03-26 16:37:44 -07:00
Hansung Kim
7f00e6c376
vecaddx: Change arg device address to 7fff0000
2024-03-26 10:44:33 -07:00
Hansung Kim
cc7b34ec5b
vecaddx: Write args.bin and input.bin
2024-03-26 10:44:02 -07:00
Hansung Kim
8f3474b151
Don't clean *.bin
2024-03-24 01:45:08 -07:00
Hansung Kim
2036d37840
sgemm_wg: Prevent run-ahead using ternary flags; reduce mem accesses
2024-03-13 21:35:24 -07:00
Hansung Kim
510a834db5
sgemm_wg: Implement software barrier for inter-core synchronization
2024-03-12 15:34:42 -07:00
Hansung Kim
fbe872c831
sgemm_wg: Add missing makefile dep to common.h
2024-03-12 15:34:17 -07:00
Hansung Kim
6f4dfe5a0e
sgemm_wg: Implement 2D threadtiling
2024-02-29 14:40:54 -08:00
Hansung Kim
a06b2dd20e
sgemm_wg: Cleanup & proper unroll
2024-02-28 21:17:42 -08:00
Hansung Kim
46f242e520
sgemm_wg: Constantify BM/BN/BK/TM, computationally set gridsize and TB/core
2024-02-27 22:23:25 -08:00
Hansung Kim
27646bb507
sgemm_wg: Implement multiple C per thread with sliding A/B blocks
2024-02-27 22:06:01 -08:00
Hansung Kim
f1e7407d3a
sgemm_wg: Run multiple threadblock per core
2024-02-27 15:44:04 -08:00
Hansung Kim
d2da0d3394
sgemm_wg: Parameterize threadblock dimensions
2024-02-17 18:05:59 -08:00
Hansung Kim
301f1ca260
sgemm_wg: Implement blocking over k-dimension
2024-02-16 16:20:57 -08:00
Hansung Kim
5f79e8a3f1
sgemm_wg: reference matmul in cpu
2024-02-12 22:29:38 -08:00
Hansung Kim
6b420aceb6
sgemm_wg: write simple C=A*A matmul
2024-02-12 22:22:28 -08:00
Hansung Kim
a43d5eb1a7
Merge remote-tracking branch 'upstream/master' into kernels
2024-02-12 20:50:32 -08:00
Hansung Kim
6a1a506b64
sgemm_wg: save args and input bin
2024-02-12 20:49:08 -08:00
Hansung Kim
ad8bf9b223
Add sgemm_wg C kernel
2024-02-07 21:31:08 -08:00
Blaise Tine
9dc5793046
minor udpate
2023-11-27 02:21:47 -08:00
Blaise Tine
2f1171ca76
minor update
2023-11-27 02:04:22 -08:00
Blaise Tine
61e3442ef8
adding opencl convolution benchmark
2023-11-14 22:31:30 -08:00
Blaise Tine
4e7a536918
adding tensor regression test.
2023-11-14 05:37:46 -08:00
Blaise Tine
62cdd8e993
minor update
2023-11-11 15:49:39 -08:00
Blaise Tine
c1e168fdbe
Vortex 2.0 changes:
...
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
minor update
minor update
minor update
minor update
minor update
minor update
cleanup
cleanup
cache bindings and memory perf refactory
minor update
minor update
hw unit tests fixes
minor update
minor update
minor update
minor update
minor update
minor udpate
minor update
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor update
minor update
minor update
minor update
minor update
minor update
minor updates
minor updates
minor updates
minor updates
minor update
minor update
2023-11-10 02:47:05 -08:00
Blaise Tine
d47cccc157
Vortex 2.0 changes:
...
+ Microarchitecture optimizations
+ 64-bit support
+ Xilinx FPGA support
+ LLVM-16 support
+ Refactoring and quality control fixes
2023-10-19 20:51:22 -07:00
Blaise Tine
b9cda8fca7
minor update
2023-05-15 20:19:14 -04:00
Blaise Tine
e1b666cb93
minor update
2022-07-14 08:55:09 -04:00
Blaise Tine
2277e3c878
minor update
2022-02-05 17:59:58 -05:00
Santosh Srivatsan
b7e5a83ba3
Merged branch xlen-parameterization into staging
2022-02-05 13:47:42 -05:00
Blaise Tine
cf2a0a5f39
code refactoring
2022-02-04 00:07:24 -05:00
Blaise Tine
a06812f93f
minor updates
2022-02-01 22:51:33 -05:00
Blaise Tine
d48f1c1c5f
minor updates
2022-02-01 06:53:31 -05:00
Blaise Tine
3750c672a7
Makefiles update
2022-01-30 00:26:55 -05:00
Blaise Tine
f7887d8720
refactoring device memory allocation and cleanup
2022-01-28 21:57:16 -05:00
Santosh Srivatsan
427146d59b
Removed 64-bit runtime and regression tests
2021-12-11 17:20:40 -05:00
Santosh Srivatsan
5edb9098ce
Merge branch 'simx64'
2021-12-10 21:48:29 -05:00
Blaise Tine
0e2de4f13a
prefetch test fixes
2021-12-09 04:54:10 -05:00
Blaise Tine
fb6106267c
Merge branch 'master' of https://github.com/vortexgpgpu/vortex
2021-12-09 00:00:28 -05:00
Blaise Tine
a9ec1c08a7
minor update
2021-12-06 15:44:25 -05:00
Blaise Tine
41d7e6c63a
cummulative fixes, RTL uuid trace, texture unit fixes, simx timing fixes
2021-11-30 07:08:15 -05:00