Hansung Kim
|
fd2ff6208d
|
Generate golden data for flash in generate_matrix.py
|
2024-08-15 17:41:57 -07:00 |
|
Hansung Kim
|
95e3e96c6c
|
tensor: Change B in-memory layout to column-major
|
2024-08-12 15:22:07 -07:00 |
|
Hansung Kim
|
07dd9e35a0
|
tensor: Fix dimensions for fp16 in script
|
2024-08-12 15:22:07 -07:00 |
|
Hansung Kim
|
c1906ebb4f
|
tensor: Embed binary instead of hardcoding literals
the C compiler doesn't support fp16
|
2024-08-12 15:22:07 -07:00 |
|
Hansung Kim
|
1b5daccac9
|
tensor: Generate fp16-packed matrix in script
|
2024-08-12 15:22:07 -07:00 |
|
Hansung Kim
|
a12f2c296c
|
tensor: Update readme
|
2024-07-31 11:55:28 -07:00 |
|
Hansung Kim
|
446b1a4c2e
|
tensor: Add readme
|
2024-07-31 11:53:31 -07:00 |
|
Hansung Kim
|
285776404f
|
tensor: Fix tensor unittest kernel
|
2024-07-31 11:49:41 -07:00 |
|
Hansung Kim
|
29f7290948
|
tensor: Fix correctness script
|
2024-07-31 11:39:50 -07:00 |
|
Hansung Kim
|
800d9801b5
|
tensor: Test with multiple accumulators
|
2024-06-07 18:19:20 -07:00 |
|
Hansung Kim
|
2cac995db9
|
tensor: generate 8x8 in correctness script
|
2024-06-07 18:13:57 -07:00 |
|
Hansung Kim
|
0a884e1ead
|
tensor: spawn on all warps, 8 lanes
|
2024-05-25 20:19:57 -07:00 |
|
Hansung Kim
|
8a521a1de8
|
Add 8-lane operand mapping
|
2024-05-10 23:23:11 -07:00 |
|
Hansung Kim
|
5821bfd10d
|
Repeat vx_wmma issue & hardcode dst address
|
2024-05-08 13:22:26 -07:00 |
|
Hansung Kim
|
b4c812f9f8
|
Write expected_C to a binary file
|
2024-05-05 18:27:56 -07:00 |
|
joshua
|
5bd25985c6
|
i kinda forgot most of changes
|
2024-05-04 23:01:47 -07:00 |
|
joshua
|
d8f9359fae
|
test case update
|
2024-03-28 13:04:02 -07:00 |
|
joshua
|
e16584ddd9
|
bleh still not work
|
2024-03-27 00:26:04 -07:00 |
|