Performance optimization for the TwoPunctures module

* Re-enabled OpenMP.

1. Batch spectral derivatives (Chebyshev & Fourier) via precomputed matrices:
Chebyshev/Fourier transforms and derivatives are precomputed as explicit physical-space operator matrices.
Batch DGEMM now applies to entire tensor fields, mathematically identical to original per-line transforms but vastly faster.

2. Gauss-Seidel relaxation & tridiagonal solver workspace reuse:
Per-thread reusable workspaces replace per-call heap new/delete in all tridiagonal and relaxation routines.

3. Efficient OpenMP multithreading throughout relaxation/deriv:
relax_omp and friends parallelize over grouped lines/planes, maximizing threading efficiency and memory independence.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
This commit is contained in:
2026-02-07 14:46:46 +08:00
parent f5ed23d687
commit f345b0e520
3 changed files with 917 additions and 215 deletions

View File

@@ -15,10 +15,9 @@ LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore
## -xHost: Optimize for the host CPU architecture (Intel/AMD compatible)
## -fp-model fast=2: Aggressive floating-point optimizations
## -fma: Enable fused multiply-add instructions
## Note: OpenMP has been disabled (-qopenmp removed) due to performance issues
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo -qopenmp \
-Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo -qopenmp \
-align array64byte -fpp -I${MKLROOT}/include
f90 = ifx
f77 = ifx