- makefile.inc: add -ipo (interprocedural optimization) and
-align array64byte (64-byte array alignment for vectorization)
- fmisc.f90: remove redundant funcc=0.d0 zeroing from symmetry_bd,
symmetry_tbd, symmetry_stbd (~328+ full-array memsets eliminated
per timestep)
- enforce_algebra.f90: rewrite enforce_ag and enforce_ga as point-wise
loops, replacing 12 stack-allocated 3D temporary arrays with scalar
locals for better cache locality
All changes are mathematically equivalent — no algorithmic modifications.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit transitions the verification approach from post-Newtonian theory
comparison to regression testing against baseline simulations, and optimizes
critical numerical kernels using Intel oneMKL BLAS routines.
Verification Changes:
- Replace PN theory-based RMS calculation with trajectory-based comparison
- Compare optimized results against baseline (GW150914-origin) on XY plane
- Compute RMS independently for BH1 and BH2, report maximum as final metric
- Update documentation to reflect new regression test methodology
Performance Optimizations:
- Replace manual vector operations with oneMKL BLAS routines:
* norm2() and scalarproduct() now use cblas_dnrm2/cblas_ddot (C++)
* L2 norm calculations use DDOT for dot products (Fortran)
* Interpolation weighted sums use DDOT (Fortran)
- Disable OpenMP threading (switch to sequential MKL) for better performance
Build Configuration:
- Switch from lmkl_intel_thread to lmkl_sequential
- Remove -qopenmp flags from compiler options
- Maintain aggressive optimization flags (-O3, -xHost, -fp-model fast=2, -fma)
Other Changes:
- Update .gitignore for GW150914-origin, docs, and temporary files
- Update makefile.inc with Intel oneAPI compiler flags and oneMKL linking
- Configure taskset CPU binding to use nohz_full cores (4-55, 60-111)
- Set build parallelism to 104 jobs for faster compilation
- Update MPI process count to 48 in input configuration
Bind all computation processes (ABE, ABEGPU, TwoPunctureABE) to
CPU cores 4-55 and 60-111 using numactl --physcpubind to prevent
interference with system processes on reserved cores.