Apply Intel Advisor optimization recommendations:
- Add FORCEINLINE to polint for better inlining
- Add SIMD VECTORLENGTHFOR and UNROLL directives to fderivs,
fdderivs, symmetry_bd, and kodis functions
This improves vectorization efficiency of finite difference
computations.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Changes:
- polint: Rewrite Neville algorithm from array-slice operations to
scalar loop. Mathematically identical, avoids temporary array
allocations for den(1:n-m) slices. (Credit: yx-fmisc branch)
- polin2: Swap interpolation order so inner loop accesses ya(:,j)
(contiguous in Fortran column-major) instead of ya(i,:) (strided).
Tensor product interpolation is commutative; all call sites pass
identical coordinate arrays for all dimensions.
- polin3: Swap interpolation order to process contiguous first
dimension first: ya(:,j,k) -> yatmp(:,k) -> ymtmp(:).
Same commutativity argument as polin2.
Compile-time safety switch:
-DPOLINT_LEGACY_ORDER restores original dimension ordering
Default (no flag): uses optimized contiguous-memory ordering
Usage:
# Production (optimized order):
make clean && make -j ABE
# Fallback if results differ (original order):
Add -DPOLINT_LEGACY_ORDER to f90appflags in makefile.inc
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- makefile.inc: add -ipo (interprocedural optimization) and
-align array64byte (64-byte array alignment for vectorization)
- fmisc.f90: remove redundant funcc=0.d0 zeroing from symmetry_bd,
symmetry_tbd, symmetry_stbd (~328+ full-array memsets eliminated
per timestep)
- enforce_algebra.f90: rewrite enforce_ag and enforce_ga as point-wise
loops, replacing 12 stack-allocated 3D temporary arrays with scalar
locals for better cache locality
All changes are mathematically equivalent — no algorithmic modifications.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit transitions the verification approach from post-Newtonian theory
comparison to regression testing against baseline simulations, and optimizes
critical numerical kernels using Intel oneMKL BLAS routines.
Verification Changes:
- Replace PN theory-based RMS calculation with trajectory-based comparison
- Compare optimized results against baseline (GW150914-origin) on XY plane
- Compute RMS independently for BH1 and BH2, report maximum as final metric
- Update documentation to reflect new regression test methodology
Performance Optimizations:
- Replace manual vector operations with oneMKL BLAS routines:
* norm2() and scalarproduct() now use cblas_dnrm2/cblas_ddot (C++)
* L2 norm calculations use DDOT for dot products (Fortran)
* Interpolation weighted sums use DDOT (Fortran)
- Disable OpenMP threading (switch to sequential MKL) for better performance
Build Configuration:
- Switch from lmkl_intel_thread to lmkl_sequential
- Remove -qopenmp flags from compiler options
- Maintain aggressive optimization flags (-O3, -xHost, -fp-model fast=2, -fma)
Other Changes:
- Update .gitignore for GW150914-origin, docs, and temporary files