Compare commits

..

5 Commits

Author SHA1 Message Date
ed89bc029b Fix potential division by zero in reta_val calculation and enable NaN checks
Added a safety check for the denominator in the reta_val calculation to
prevent division by zero when chi approaches zero (e.g., at far-field
boundaries). Also enabled DEBUG_NAN_CHECK macro to catch invalid inputs early.
Initialized output arrays to zero to prevent uninitialized memory access.
2026-01-19 20:29:48 +08:00
19274e93d1 Fix boundary handling in bssn_rhs_opt.f90 to prevent NaNs
Refactored calc_derivs and calc_dderivs to include correct boundary
handling logic matching the legacy code. Implemented fallback to 2nd
order derivatives when near boundaries where 4th order stencils cannot
be used. Added logic to initialize output arrays to zero to avoid
uninitialized memory access.
2026-01-19 20:03:22 +08:00
ae1a474cca Fix compilation errors and complete logic in BSSN RHS optimization 2026-01-19 19:22:52 +08:00
cbb8fb3a87 patched last commit 2026-01-19 17:14:28 +08:00
4472d89a9f Optimize bssn_rhs calculation with cache blocking and vectorization
- Implemented cache blocking (BLK=8) in bssn_rhs_opt.f90 to improve L1/L2 cache hit rate.
- Introduced bssn_rhs_opt.f90 module with vectorized derivative and physics kernels.
- Renamed original implementation to bssn_rhs_legacy.f90 for fallback.
- Updated bssn_rhs.f90 to act as a dispatcher, using the optimized path for ghost_width=3.
- Updated makefile to include new source files.
- Added DEBUG_NAN_CHECK macro to optionally disable NaN checks in production.
2026-01-19 16:39:24 +08:00
7 changed files with 2436 additions and 1191 deletions

View File

@@ -16,7 +16,7 @@ import numpy
File_directory = "GW150914" ## output file directory
Output_directory = "binary_output" ## binary data file directory
## The file directory name should not be too long
MPI_processes = 64 ## number of mpi processes used in the simulation
MPI_processes = 48 ## number of mpi processes used in the simulation
GPU_Calculation = "no" ## Use GPU or not
## (prefer "no" in the current version, because the GPU part may have bugs when integrated in this Python interface)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -34,7 +34,7 @@ C++FILES_GPU = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o
F90FILES = enforce_algebra.o fmisc.o initial_puncture.o prolongrestrict.o\
prolongrestrict_cell.o prolongrestrict_vertex.o\
rungekutta4_rout.o bssn_rhs.o diff_new.o kodiss.o kodiss_sh.o\
rungekutta4_rout.o bssn_rhs_opt.o bssn_rhs.o bssn_rhs_legacy.o diff_new.o kodiss.o kodiss_sh.o\
lopsidediff.o sommerfeld_rout.o getnp4.o diff_new_sh.o\
shellfunctions.o bssn_rhs_ss.o Set_Rho_ADM.o\
getnp4EScalar.o bssnEScalar_rhs.o bssn_constraint.o ricci_gamma.o\

View File

@@ -7,8 +7,9 @@
filein = -I/usr/include/ -I${MKLROOT}/include
## Using sequential MKL (OpenMP disabled for better single-threaded performance)
## Added -lifcore for Intel Fortran runtime and -limf for Intel math library
LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore -limf -lpthread -lm -ldl
LDLIBS = -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -lifcore -limf -lmpi \
-L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core \
-lpthread -lm -ldl
## Aggressive optimization flags:
## -O3: Maximum optimization

View File

@@ -15,7 +15,7 @@ import subprocess
## taskset ensures all child processes inherit the CPU affinity mask
## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111)
## Format: taskset -c 4-55,60-111 ensures processes only run on these cores
NUMACTL_CPU_BIND = "taskset -c 0-111"
NUMACTL_CPU_BIND = "taskset -c 4-55,60-111"
## Build parallelism configuration
## Use nohz_full cores (4-55, 60-111) for compilation: 52 + 52 = 104 cores