AMSS-NCKU

64-BitBrainstorm_2026/AMSS-NCKU

Fork 0

Commit Graph

Author	SHA1	Message	Date
CGH0S7	f16469ea77	Simplify Z4C Shell GPU: CPU-side trKd+TZ_rhs wrapper Replace the duplicated z4c_gpu_rhs_ss.cu with a lightweight gpu_rhs_z4c_ss wrapper inside bssn_gpu_rhs_ss.cu (guarded by #if ABEtype==2). The wrapper: 1. Builds trKd = trK + 2TZ on host and passes it to gpu_rhs_ss 2. After BSSN GPU returns, computes TZ_rhs = alpn1Hcon/2 and applies kappa1/kappa2 constraint damping on CPU This avoids duplicate kernel definitions (linker errors) and keeps all shell GPU code in a single file. The CPU-side Z4C corrections are O(100K) operations — negligible vs GPU RHS time. Also remove the separate z4c_gpu_rhs_ss.cu and its build rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 16:05:56 +08:00
CGH0S7	bd4ce3fbf3	GPU-accelerate Shell-Patch BSSN evolution Phase 1: Enable GPU resident state for Cartesian patches in Shell mode. - Remove WithShell guard from bssn_cuda_use_resident_sync(). - Add GPU-to-CPU state sync before shell CPU consumers (SHStep, CS_Inter, inline shell RHS blocks). Phase 2: GPU-accelerate BSSN Shell Patch RHS. - Create bssn_gpu.h with RHS_SS_PARA macro and gpu_rhs_ss declaration. - Fix compilation bugs in legacy bssn_gpu_rhs_ss.cu (deprecated cudaThreadSynchronize, tmp_con2 redeclaration, ijkmin3_h typo, CUDA_SAFE_CALL, missing compare_result guard). - Add bssn_gpu_rhs_ss.o to CFILES_CUDA_BSSN with build rule. - Write cuda_compute_rhs_bssn_ss() wrapper bridging Fortran and GPU parameter conventions, redirect all shell RHS call sites via #define. Verified: 30-step Shell-Patch GPU run completes without errors/NaN. Step wall time ~4.4s (step_fn ~2.0s + RP ~0.68s + constraint ~0.70s). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 18:50:10 +08:00
CGH0S7	f2fc9af70e	asc26 amss-ncku initialized	2026-01-13 15:01:15 +08:00

Author

SHA1

Message

Date

CGH0S7

f16469ea77

Simplify Z4C Shell GPU: CPU-side trKd+TZ_rhs wrapper

Replace the duplicated z4c_gpu_rhs_ss.cu with a lightweight
gpu_rhs_z4c_ss wrapper inside bssn_gpu_rhs_ss.cu (guarded by
#if ABEtype==2). The wrapper:
1. Builds trKd = trK + 2*TZ on host and passes it to gpu_rhs_ss
2. After BSSN GPU returns, computes TZ_rhs = alpn1*Hcon/2 and
   applies kappa1/kappa2 constraint damping on CPU

This avoids duplicate kernel definitions (linker errors) and
keeps all shell GPU code in a single file. The CPU-side Z4C
corrections are O(100K) operations — negligible vs GPU RHS time.

Also remove the separate z4c_gpu_rhs_ss.cu and its build rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-10 16:05:56 +08:00

CGH0S7

bd4ce3fbf3

GPU-accelerate Shell-Patch BSSN evolution

Phase 1: Enable GPU resident state for Cartesian patches in Shell mode.
- Remove WithShell guard from bssn_cuda_use_resident_sync().
- Add GPU-to-CPU state sync before shell CPU consumers (SHStep,
  CS_Inter, inline shell RHS blocks).

Phase 2: GPU-accelerate BSSN Shell Patch RHS.
- Create bssn_gpu.h with RHS_SS_PARA macro and gpu_rhs_ss declaration.
- Fix compilation bugs in legacy bssn_gpu_rhs_ss.cu (deprecated
  cudaThreadSynchronize, tmp_con2 redeclaration, ijkmin3_h typo,
  CUDA_SAFE_CALL, missing compare_result guard).
- Add bssn_gpu_rhs_ss.o to CFILES_CUDA_BSSN with build rule.
- Write cuda_compute_rhs_bssn_ss() wrapper bridging Fortran and GPU
  parameter conventions, redirect all shell RHS call sites via #define.

Verified: 30-step Shell-Patch GPU run completes without errors/NaN.
Step wall time ~4.4s (step_fn ~2.0s + RP ~0.68s + constraint ~0.70s).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-09 18:50:10 +08:00

CGH0S7

f2fc9af70e

asc26 amss-ncku initialized

2026-01-13 15:01:15 +08:00

3 Commits