fbb2ed112d
Fix Compile_Constraint/analysis use CPU Fortran for shell RHS
...
Limit GPU shell RHS redirection to Step and SHStep only via #define/#undef.
Compute_Constraint, Interp_Constraint, and Constraint_Out continue using
the CPU Fortran path to avoid GPU alloc-per-call overhead during
initialization and analysis phases.
Also: wrap compare_result_gpu in #ifdef RESULT_CHECK to avoid link error.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-09 19:25:45 +08:00
bd4ce3fbf3
GPU-accelerate Shell-Patch BSSN evolution
...
Phase 1: Enable GPU resident state for Cartesian patches in Shell mode.
- Remove WithShell guard from bssn_cuda_use_resident_sync().
- Add GPU-to-CPU state sync before shell CPU consumers (SHStep,
CS_Inter, inline shell RHS blocks).
Phase 2: GPU-accelerate BSSN Shell Patch RHS.
- Create bssn_gpu.h with RHS_SS_PARA macro and gpu_rhs_ss declaration.
- Fix compilation bugs in legacy bssn_gpu_rhs_ss.cu (deprecated
cudaThreadSynchronize, tmp_con2 redeclaration, ijkmin3_h typo,
CUDA_SAFE_CALL, missing compare_result guard).
- Add bssn_gpu_rhs_ss.o to CFILES_CUDA_BSSN with build rule.
- Write cuda_compute_rhs_bssn_ss() wrapper bridging Fortran and GPU
parameter conventions, redirect all shell RHS call sites via #define.
Verified: 30-step Shell-Patch GPU run completes without errors/NaN.
Step wall time ~4.4s (step_fn ~2.0s + RP ~0.68s + constraint ~0.70s).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-09 18:50:10 +08:00
5eb49949d9
Fix AHF crash under CUDA resident-sync mode
...
Download BSSN StateList from GPU to CPU before AHFinderDirect_find_horizons
so that AH_Interp_Points reads valid field data instead of stale CPU arrays.
The resident-sync path keeps canonical state on GPU; without this download the
Newton iteration diverges and probes outside the computational domain.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com >
2026-05-09 16:11:56 +08:00
39450228f5
Accelerate Shell-Patch interpolation fast paths
2026-05-08 13:26:16 +08:00
063f28b3b4
Add Shell-Patch GPU runtime fast paths
2026-05-08 09:26:36 +08:00
1064a68d16
Optimize BSSN-EM 8th-order AMR transfers
2026-05-07 21:38:16 +08:00
dcc83bafcb
Support 2nd and 8th order CUDA AMR paths
2026-05-07 20:31:26 +08:00
c4d8d41b25
Cover Z4C CUDA AMR restrict prolong
2026-05-07 19:49:09 +08:00
0076b3ca18
Optimize 6th-order CUDA AMR stencils
2026-05-07 19:22:37 +08:00
9ff2f065be
Apply BSSN AMR sync default to EScalar
2026-05-07 17:12:33 +08:00
2317e4abde
Fix BSSN GPU resident AMR sync default
2026-05-07 17:11:09 +08:00
fea2dcc0d5
Fix BSSN-EM runtime crash
2026-05-07 16:47:55 +08:00
5525465cad
Support CUDA finite-difference order selection
2026-05-07 16:28:02 +08:00
96829d0441
Optimize Z4C GPU runtime defaults
2026-05-07 15:37:09 +08:00
83afaf19ce
Skip zero EM resident downloads
2026-05-07 13:04:46 +08:00
cb911dec06
Add EM GPU fast paths and defaults
2026-05-07 12:18:56 +08:00
dd0e20d8c7
Fix BSSN-EScalar CUDA boundary and scalar KO
2026-05-06 15:44:35 +08:00
ffa0d801ed
Default Python GPU runner to EScalar fast path
2026-05-06 00:12:46 +08:00
ae64a22178
Complete BSSN-EScalar CUDA resident transfers
2026-05-05 23:57:42 +08:00
85fe29cc2e
Optimize BSSN-EScalar CUDA path
2026-05-05 10:47:46 +08:00
06f62dee36
Switch back to Intel toolchain as the default option
...
Seems that Intel MPI also supports CUDA-aware by setting I_MPI_OFFLOAD to 1. Besides, I_MPI_OFFLOAD_IPC=0 is needed to avoid segfaults.
2026-05-01 21:59:13 +08:00
35b6ceff02
Broaden cached CUDA sync paths
2026-05-01 18:03:04 +08:00
51f3819892
Save generated source formatting state
2026-04-30 20:47:44 +08:00
a9a3809148
Default Python launcher to fast GPU path
2026-04-30 20:15:34 +08:00
b1974ef146
Stabilize device AMR restrict across regrid
2026-04-30 20:01:18 +08:00
be9033f449
Add optional CUDA surface interpolation
2026-04-30 19:21:19 +08:00
6835608f92
Add configurable analysis MAP cadence
2026-04-30 19:10:12 +08:00
e0d0673c8e
Enable optimized GPU runs from Python launcher
2026-04-30 18:31:31 +08:00
da4d56ccf7
Optimize BSSN surface interpolation fast path
2026-04-30 18:25:21 +08:00
a6483d013d
Add CUDA AMR restrict diagnostics
2026-04-30 12:20:44 +08:00
8486532920
Add resident BSSN GPU point interpolation
2026-04-30 11:39:15 +08:00
18e9c9cc50
Optimize BSSN CUDA resident AMR prolong path
2026-04-30 10:58:15 +08:00
1ee229a91f
Add keyed BSSN CUDA resident banks
2026-04-29 19:44:19 +08:00
68eab03bac
Add opt-in BSSN CUDA resident AMR path
2026-04-29 19:15:37 +08:00
090d8657ae
Optimize BSSN CUDA state transfers
2026-04-29 18:34:31 +08:00
22c1e7168b
Optimize BSSN CUDA resident state and CUDA-aware MPI
2026-04-29 17:05:10 +08:00
a0dab90bcb
Switch to NVIDIA HPC Toolchain
2026-04-29 08:31:49 +08:00
c689cc8dc9
[WIP] Add CUDA support for Z4C
...
Rewritten done by Codex.
This still has errors, do not pick this one now.
2026-04-27 11:58:43 +08:00
60fee8f1c1
Fix Z4C C++ gauge damping ordering
2026-04-26 15:38:13 +08:00
843b116954
Add C++ Z4C RHS path and port some BSSN optimizations
2026-04-25 10:39:01 +08:00
c768e1220b
Also disable cached sync for Z4C
2026-04-25 10:25:54 +08:00
02f149e2e3
Disable cached sync for BSSN-EScalar
2026-04-25 10:17:47 +08:00
422e8ec4dc
Fallback BSSN-EScalar restrict/prolong path
2026-04-25 10:10:34 +08:00
c4909b9843
更新精度检查脚本加入图像比对检查
...
(cherry picked from commit ac82ebd889 )
2026-04-25 09:40:12 +08:00
f521a97563
Fix ABE CPU version build error
2026-04-25 09:39:49 +08:00
53c55451b3
Update makefile and scripts for CUDA BSSN configuration and build commands
2026-04-25 09:19:50 +08:00
768345954f
Add optional BSSN kernel profiling switches
...
(cherry picked from commit 9c31384b2f )
2026-04-25 08:39:43 +08:00
9a6df6438b
Remove dead chi derivative setup in BSSN RHS
...
(cherry picked from commit e4e741caa1 )
2026-04-25 08:38:01 +08:00
8e9463aa90
Localize chi Ricci intermediates in RHS
...
(cherry picked from commit 65e0f95f40 )
2026-04-25 08:37:41 +08:00
7c6f15002e
Elide dead stores in BSSN RHS hot path
...
(cherry picked from commit f9fbf97e64 )
2026-04-25 08:37:40 +08:00