Add OpenMP parallelization to Fortran compute kernels

Add !$omp parallel do collapse(2) directives to all triple-loop
stencil kernels (fderivs, fdderivs, fdx/fdy/fdz, kodis, lopsided,
enforce_ag/enforce_ga) across all ghost_width variants. Add !$omp
parallel workshare to RK4/ICN/Euler whole-array update routines.

Build system: add -qopenmp to compile and link flags, switch MKL
from sequential to threaded (-lmkl_intel_thread -liomp5).

Runtime: set OMP_NUM_THREADS=96, OMP_STACKSIZE=16M, OMP_PROC_BIND=close,
OMP_PLACES=cores for 96-core server target.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-02-10 23:40:17 +08:00
parent caf192b2e4
commit 714c6e90c6
10 changed files with 171 additions and 32 deletions

View File

@@ -68,7 +68,8 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
@@ -233,7 +234,8 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
#if 0
@@ -558,7 +560,8 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
@@ -774,7 +777,8 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction