Optimize bssn_rhs calculation with cache blocking and vectorization

- Implemented cache blocking (BLK=8) in bssn_rhs_opt.f90 to improve L1/L2 cache hit rate.
- Introduced bssn_rhs_opt.f90 module with vectorized derivative and physics kernels.
- Renamed original implementation to bssn_rhs_legacy.f90 for fallback.
- Updated bssn_rhs.f90 to act as a dispatcher, using the optimized path for ghost_width=3.
- Updated makefile to include new source files.
- Added DEBUG_NAN_CHECK macro to optionally disable NaN checks in production.
This commit is contained in:
2026-01-19 16:39:24 +08:00
parent 9deeda9831
commit 4472d89a9f
4 changed files with 2047 additions and 1187 deletions

File diff suppressed because it is too large Load Diff