Replace the duplicated z4c_gpu_rhs_ss.cu with a lightweight
gpu_rhs_z4c_ss wrapper inside bssn_gpu_rhs_ss.cu (guarded by
#if ABEtype==2). The wrapper:
1. Builds trKd = trK + 2*TZ on host and passes it to gpu_rhs_ss
2. After BSSN GPU returns, computes TZ_rhs = alpn1*Hcon/2 and
applies kappa1/kappa2 constraint damping on CPU
This avoids duplicate kernel definitions (linker errors) and
keeps all shell GPU code in a single file. The CPU-side Z4C
corrections are O(100K) operations — negligible vs GPU RHS time.
Also remove the separate z4c_gpu_rhs_ss.cu and its build rule.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Create z4c_gpu_rhs_ss.cu (reusing BSSN shell FD/chain-rule kernels):
- Uploads trKd = trK + 2*TZ to GPU so existing BSSN algebraic kernels
compute correct Z4C physical equations without modification
- New kern_z4c_post applies TZ_rhs = alpn1 * Hcon / 2, kappa1/kappa2
constraint damping, TZ advection (lopsided), and dissipation (kodis)
- Adds TZ/TZ_rhs to Meta struct, alloc/upload/download/free lifecycle
Add cuda_compute_rhs_z4c_ss() wrapper in Z4c_class.C matching the
Fortran f_compute_rhs_Z4c_ss signature, with #define redirection for
Step/SHStep call sites and #undef before analysis functions.
Add z4c_gpu_rhs_ss.o to ABE_CUDA_CFILES and build rule in makefile.
Add kappa1_c/kappa2_c constants to gpu_rhsSS_mem.h.
Build verified with USE_CUDA_Z4C=1 + WithShell — compiles and links
cleanly. All three Shell GPU files now coexist: bssn_gpu_rhs_ss.o
(BSSN), z4c_gpu_rhs_ss.o (Z4C), both sharing FD/chain-rule kernels.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split prolongpointstru into search-only (prolongpointstru_search) and
append-only (prolongpointstru_append) functions. The search is read-only
and thread-safe; each thread builds private linked lists via
prolongpointstru_append, merged after the parallel loop.
This eliminates critical-section contention and delivers ~2.2x speedup:
setupintintstuff: 511s -> 252s, total init: 592s -> 267s.
Also add -qopenmp to ShellPatch.o compilation via makefile override rule
and <omp.h> include with _OPENMP guards + fallback stubs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>