New bssn_em_rhs_c.C computes EM field RHS (E,B,Kpsi,Kphi) and stress-energy
tensor, then calls the C BSSN RHS kernel with source terms. Replaces empart.f90
when USE_CXX_EM_KERNEL=1. Supports all ghost_width orders via existing derivative
kernels. Controlled by USE_CXX_EM_KERNEL switch (default 0, experimental).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Split prolongpointstru into search-only (prolongpointstru_search) and
append-only (prolongpointstru_append) functions. Parallelize shell-point
interpolation table construction with #pragma omp parallel for collapse(3)
and per-thread linked lists (merged after the loop to avoid data races).
Add OMP_FLAG = -fopenmp in makefile.inc and ShellPatch.o override rule
in makefile for AOCC OpenMP runtime (-lomp already linked).
Speedup: setupintintstuff ~2.2x faster on multi-core.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace Intel compilers (ifx/icpx/icx) with AOCC (flang/clang++/clang),
Intel MPI (mpiicpx) with AOCC-built OpenMPI (mpicxx), and Intel MKL
with AOCL BLIS/libFLAME. Replace -xHost with -march=znver4, -ipo with
-flto, -fp-model fast=2 with -ffast-math, -qopenmp with -fopenmp.
Remove PGO, TBB allocator, and Intel-specific runtime libraries.
Fix MKL-specific includes in TwoPunctures.C and gaussj.C to use
standard CBLAS/LAPACKE headers from AOCL.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- makefile.inc: add -ipo (interprocedural optimization) and
-align array64byte (64-byte array alignment for vectorization)
- fmisc.f90: remove redundant funcc=0.d0 zeroing from symmetry_bd,
symmetry_tbd, symmetry_stbd (~328+ full-array memsets eliminated
per timestep)
- enforce_algebra.f90: rewrite enforce_ag and enforce_ga as point-wise
loops, replacing 12 stack-allocated 3D temporary arrays with scalar
locals for better cache locality
All changes are mathematically equivalent — no algorithmic modifications.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit transitions the verification approach from post-Newtonian theory
comparison to regression testing against baseline simulations, and optimizes
critical numerical kernels using Intel oneMKL BLAS routines.
Verification Changes:
- Replace PN theory-based RMS calculation with trajectory-based comparison
- Compare optimized results against baseline (GW150914-origin) on XY plane
- Compute RMS independently for BH1 and BH2, report maximum as final metric
- Update documentation to reflect new regression test methodology
Performance Optimizations:
- Replace manual vector operations with oneMKL BLAS routines:
* norm2() and scalarproduct() now use cblas_dnrm2/cblas_ddot (C++)
* L2 norm calculations use DDOT for dot products (Fortran)
* Interpolation weighted sums use DDOT (Fortran)
- Disable OpenMP threading (switch to sequential MKL) for better performance
Build Configuration:
- Switch from lmkl_intel_thread to lmkl_sequential
- Remove -qopenmp flags from compiler options
- Maintain aggressive optimization flags (-O3, -xHost, -fp-model fast=2, -fma)
Other Changes:
- Update .gitignore for GW150914-origin, docs, and temporary files
- Update makefile.inc with Intel oneAPI compiler flags and oneMKL linking
- Configure taskset CPU binding to use nohz_full cores (4-55, 60-111)
- Set build parallelism to 104 jobs for faster compilation
- Update MPI process count to 48 in input configuration