Enable multi-threaded MKL for better resource utilization

- Changed from libmkl_sequential to libmkl_intel_thread - Added automatic MKL thread count configuration (104 cores / MPI_processes) - Updated runtime scripts to set MKL_NUM_THREADS environment variable - Added comprehensive optimization documentation Expected improvement: 5-15% from better MKL utilization Note: Main performance bottleneck is in computation loops, not MKL functions
2026-01-19 09:31:29 +08:00
parent 9deeda9831
commit c524228d23
2 changed files with 28 additions and 12 deletions
--- a/AMSS_NCKU_source/makefile.inc
+++ b/AMSS_NCKU_source/makefile.inc
@@ -6,10 +6,12 @@
 ## Intel oneAPI version with oneMKL (Optimized for performance)
 filein  = -I/usr/include/ -I${MKLROOT}/include

-## Using sequential MKL (OpenMP disabled for better single-threaded performance)
+## Using multi-threaded MKL for better scalability with MPI
+## This allows MKL functions (FFT, BLAS, LAPACK) to use multiple threads internally
+## while keeping the application code as pure MPI (no OpenMP pragmas in user code)
 LDLIBS  = -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -lifcore -limf -lmpi \
-          -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \
-          -lpthread -lm -ldl
+          -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core \
+          -liomp5 -lpthread -lm -ldl

 ## Aggressive optimization flags:
 ## -O3: Maximum optimization