|
|
090d8657ae
|
Optimize BSSN CUDA state transfers
|
2026-04-29 18:34:31 +08:00 |
|
|
|
22c1e7168b
|
Optimize BSSN CUDA resident state and CUDA-aware MPI
|
2026-04-29 17:05:10 +08:00 |
|
|
|
bb20c9a876
|
fix ADM Constrant Violation Analysis
|
2026-04-15 19:19:16 +08:00 |
|
|
|
8fe60ea703
|
Add zero matter handling and interpolation for resident state in CUDA BSSN
|
2026-04-15 00:25:53 +08:00 |
|
|
|
9ab7e7c7f9
|
Fuse phases 5 and 6 for Gamma_rhs computation and optimize phases 8 and 9 for efficiency
|
2026-04-14 23:23:04 +08:00 |
|
|
|
f9119e8a2a
|
Add resident-GA mode switch and simplify sync logic
|
2026-04-14 21:09:27 +08:00 |
|
|
|
726d743376
|
Fuse Ricci assembly and optimize trK/Aij gauge kernels
|
2026-04-14 19:20:12 +08:00 |
|
|
|
af344bf1e5
|
Add Phase-10 Ricci kernels and batch launch flow
|
2026-04-14 19:00:22 +08:00 |
|
|
|
7191fc0b96
|
Move resident sync comm buffers into StepAllocation pool
|
2026-04-13 21:04:44 +08:00 |
|
|
|
b3ec244cf9
|
Add batched first/second derivative kernels for CUDA RHS
|
2026-04-13 20:51:08 +08:00 |
|
|
|
e952ee8e91
|
Batch GA/BH subset sync with indexed GPU pack/unpack buffers
|
2026-04-13 20:40:09 +08:00 |
|
|
|
c5d1268dd1
|
Batch patch-boundary copy and gate CPU BC in GPU substeps
|
2026-04-13 11:52:17 +08:00 |
|
|
|
4bdfc90f22
|
Pass pointer tables as kernel args and skip redundant symbol uploads
|
2026-04-13 11:19:00 +08:00 |
|
|
|
c49a4e00c9
|
Batch symbd_pack/lopsided/kodiss over all state variables
|
2026-04-13 11:02:55 +08:00 |
|
|
|
1b3c0b80d2
|
Refactor CUDA step buffers to remove loop-time allocations
|
2026-04-13 10:33:03 +08:00 |
|
|
|
636e35bfd8
|
Add direct CUDA resident-state sync path and profiling hooks
|
2026-04-13 00:57:05 +08:00 |
|
|
|
7f2a391dd2
|
Cache matter fields in StepContext across RK4 substeps
|
2026-04-12 22:19:45 +08:00 |
|
|
|
4fa12a2009
|
Integrate CUDA support into RK4 substep execution
|
2026-04-12 22:11:44 +08:00 |
|
|
|
86a683de26
|
Replace legacy ABEGPU stack with ABE_CUDA backend
|
2026-04-12 21:19:14 +08:00 |
|