AMSS-NCKU

64-BitBrainstorm_2026/AMSS-NCKU

Author	SHA1	Message	Date
CGH0S7	cbb8fb3a87	patched last commit	2026-01-19 17:14:28 +08:00
CGH0S7	4472d89a9f	Optimize bssn_rhs calculation with cache blocking and vectorization - Implemented cache blocking (BLK=8) in bssn_rhs_opt.f90 to improve L1/L2 cache hit rate. - Introduced bssn_rhs_opt.f90 module with vectorized derivative and physics kernels. - Renamed original implementation to bssn_rhs_legacy.f90 for fallback. - Updated bssn_rhs.f90 to act as a dispatcher, using the optimized path for ghost_width=3. - Updated makefile to include new source files. - Added DEBUG_NAN_CHECK macro to optionally disable NaN checks in production.	2026-01-19 16:39:24 +08:00
CGH0S7	9deeda9831	Refactor verification method and optimize numerical kernels with oneMKL BLAS This commit transitions the verification approach from post-Newtonian theory comparison to regression testing against baseline simulations, and optimizes critical numerical kernels using Intel oneMKL BLAS routines. Verification Changes: - Replace PN theory-based RMS calculation with trajectory-based comparison - Compare optimized results against baseline (GW150914-origin) on XY plane - Compute RMS independently for BH1 and BH2, report maximum as final metric - Update documentation to reflect new regression test methodology Performance Optimizations: - Replace manual vector operations with oneMKL BLAS routines: * norm2() and scalarproduct() now use cblas_dnrm2/cblas_ddot (C++) * L2 norm calculations use DDOT for dot products (Fortran) * Interpolation weighted sums use DDOT (Fortran) - Disable OpenMP threading (switch to sequential MKL) for better performance Build Configuration: - Switch from lmkl_intel_thread to lmkl_sequential - Remove -qopenmp flags from compiler options - Maintain aggressive optimization flags (-O3, -xHost, -fp-model fast=2, -fma) Other Changes: - Update .gitignore for GW150914-origin, docs, and temporary files	2026-01-18 14:25:21 +08:00
CGH0S7	3a7bce3af2	Update Intel oneAPI configuration and CPU binding settings - Update makefile.inc with Intel oneAPI compiler flags and oneMKL linking - Configure taskset CPU binding to use nohz_full cores (4-55, 60-111) - Set build parallelism to 104 jobs for faster compilation - Update MPI process count to 48 in input configuration	2026-01-17 20:41:02 +08:00
CGH0S7	cb252f5ea2	Optimize numerical algorithms with Intel oneMKL - FFT.f90: Replace hand-written Cooley-Tukey FFT with oneMKL DFTI - ilucg.f90: Replace manual dot product loop with BLAS DDOT - gaussj.C: Replace Gauss-Jordan elimination with LAPACK dgesv/dgetri - makefile.inc: Add MKL include paths and library linking All optimizations maintain mathematical equivalence and numerical precision.	2026-01-16 10:58:11 +08:00
CGH0S7	57a7376044	Switch compiler toolchain from GCC to Intel oneAPI - makefile.inc: Replace GCC compilers with Intel oneAPI - C/C++: gcc/g++ -> icx/icpx - Fortran: gfortran -> ifx - MPI linker: mpic++ -> mpiicpx - Update LDLIBS and compiler flags accordingly - macrodef.h: Fix include path (microdef.fh -> macrodef.fh) Requires: source /home/intel/oneapi/setvars.sh before build	2026-01-15 16:32:12 +08:00
CGH0S7	cd5ceaa15f	main branch updated	2026-01-14 08:55:53 +08:00
CGH0S7	f2fc9af70e	asc26 amss-ncku initialized	2026-01-13 15:01:15 +08:00

8 Commits