Compare commits

..

20 Commits

Author SHA1 Message Date
jaunatisblue
f147f79ffa 修改block划分,对负载高的rank所在block进行划分,添加到空rank,空rank是平移得到的 2026-02-26 09:40:46 +08:00
jaunatisblue
8abac8dd88 对rank运行时间统计,两个函数分别在不同的计算中被调用,因此我对两个重载的函数分别进行了mpi实际计算时间的统计,对于第一个PatList_Interp_Points 调用 Interp_points,我取排名前三的rank时间,发现每次只有一个rank时间较长,Rank [ 52]: Calc 0.000012 s
Rank [  20]: Calc 0.000003 s

Rank [  35]: Calc 0.000003 s

Rank [  10]: Calc 0.000010 s

Rank [  17]: Calc 0.000005 s

Rank [  32]: Calc 0.000003 s,而且rank不固定,一般就是rank 10 和 rank 52;
但尽管有很多,比前者时间还是少很多
对于第二个Surf_Wave 调用 Interp_points,我发现前四个rank时间最长,比较固定,就是下面四个rank

Rank [  27]: Calc 0.331978 s

Rank [  35]: Calc 0.242219 s

Rank [  28]: Calc 0.242132 s

Rank [  36]: Calc 0.197024 s
因此下面surf_wave是核心
2026-02-24 14:34:24 +08:00
82339f5282 Merge lopsided advection + kodis dissipation to share symmetry_bd buffer
Cherry-picked from 38c2c30.
2026-02-20 13:36:27 +08:00
94f38c57f9 Don't hardcode pgo profile path 2026-02-20 13:36:27 +08:00
85d1e8de87 Add Intel SIMD vectorization directives to hot-spot functions
Apply Intel Advisor optimization recommendations:
- Add FORCEINLINE to polint for better inlining
- Add SIMD VECTORLENGTHFOR and UNROLL directives to fderivs,
  fdderivs, symmetry_bd, and kodis functions

This improves vectorization efficiency of finite difference
computations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 00:43:39 +08:00
5b7e05cd32 PGO updated 2026-02-11 18:26:30 +08:00
85afe00fc5 Merge plotting optimizations from chb-copilot-test
- Implement multiprocessing-based parallel plotting
- Add parallel_plot_helper.py for concurrent plot task execution
- Use matplotlib 'Agg' backend for multiprocessing safety
- Set OMP_NUM_THREADS=1 to prevent BLAS thread explosion
- Use subprocess for binary data plots to avoid thread conflicts
- Add fork bomb protection in main program

This merge only includes plotting improvements and excludes
MPI communication changes to preserve existing optimizations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 16:19:17 +08:00
5c1790277b Replace nested OutBdLow2Hi loops with batch calls in RestrictProlong
The 8 nested while(Ppc){while(Pp){OutBdLow2Hi(single,single,...)}}
loops across RestrictProlong (3 overloads) and ProlongRestrict each
produced N_c × N_f separate transfer() → MPI_Waitall barriers.
Replace with the existing batch OutBdLow2Hi(MyList<Patch>*,...) which
merges all patch pairs into a single transfer() call with 1 MPI_Waitall.

Also add Restrict_cached, OutBdLow2Hi_cached, OutBdLow2Himix_cached
to Parallel (unused for now — kept as infrastructure for future use).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-11 16:09:08 +08:00
e09ae438a2 Cache data_packer lengths in Sync_start to skip redundant buffer-size traversals
The data_packer(NULL, ...) calls that compute send/recv buffer lengths
traverse all grid segments × variables × nprocs on every Sync_start
invocation, even though lengths never change once the cache is built.
Add a lengths_valid flag to SyncCache so these length computations are
done once and reused on subsequent calls (4× per RK4 step).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 21:39:22 +08:00
d06d5b4db8 Add targeted point-to-point Interp_Points overload for surface_integral
Instead of broadcasting all interpolated point data to every MPI rank,
the new overload sends each point only to the one rank that needs it
for integration, reducing communication volume by ~nprocs times.

The consumer rank is computed deterministically using the same Nmin/Nmax
work distribution formula used by surface_integral callers. Two active
call sites (surf_Wave and surf_MassPAng with MPI_COMM_WORLD) now use
the new overload. Other callers (ShellPatch, Comm_here variants, etc.)
remain unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 19:18:56 +08:00
50e2a845f8 Replace MPI_Allreduce with owner-rank MPI_Bcast in Patch::Interp_Points
The two MPI_Allreduce calls (data + weight) were the #1 hotspot at 38.5%
CPU time. Since all ranks traverse the same block list and agree on point
ownership, we replace the global reduction with targeted MPI_Bcast from
each owner rank. This also eliminates the weight array/Allreduce entirely,
removes redundant heap allocations (shellf, weight, DH, llb, uub), and
writes interpolation results directly into the output buffer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 22:39:18 +08:00
738498cb28 Optimize MPI communication in RestrictProlong and surface_integral
Cache Sync in RestrictProlong: replace 11 basic Parallel::Sync() calls
with Parallel::Sync_cached() across RestrictProlong, RestrictProlong_aux,
and ProlongRestrict to avoid rebuilding grid segment lists every call.

Merge paired MPI_Allreduce in surface_integral: combine 9 pairs of
consecutive RP/IP Allreduce calls into single calls with count=2*NN.

Merge scalar MPI_Allreduce in surf_MassPAng: combine 3 groups of 7
scalar Allreduce calls (mass + angular/linear momentum) into single
calls with count=7.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 22:07:12 +08:00
42b9cf1ad9 Optimize MPI Sync with merged transfers, caching, and async overlap
Phase 1: Merge N+1 transfer() calls into a single transfer() per
Sync(PatchList), reducing N+1 MPI_Waitall barriers to 1 via new
Sync_merged() that collects all intra-patch and inter-patch grid
segment lists into combined per-rank arrays.

Phase 2: Cache grid segment lists and reuse grow-only communication
buffers across RK4 substeps via SyncCache struct. Caches are per-level
and per-variable-list (predictor/corrector), invalidated on regrid.
Eliminates redundant build_ghost_gsl/build_owned_gsl0/build_gstl
rebuilds and malloc/free cycles between regrids.

Phase 3: Split Sync into async Sync_start/Sync_finish to overlap
Cartesian ghost zone exchange (MPI_Isend/Irecv) with Shell patch
synchronization. Uses MPI tag 2 to avoid conflicts with SH->Synch()
which uses transfer() with tag 1.

Also updates makefile.inc paths and flags for local build environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 21:03:37 +08:00
e9d321fd00 Convert MPI_Allreduce error checks to non-blocking MPI_Iallreduce overlapped with Sync
Replace all 8 blocking MPI_Allreduce error-check calls with MPI_Iallreduce,
overlapping the reduction with subsequent Parallel::Sync/SH->Synch operations.
MPI_Wait is called after Sync completes to retrieve the error result.

This hides the Allreduce latency (46.5% of CPU time) behind the ghost zone
exchange communication that must happen anyway. Safe because Sync only copies
existing data to ghost zones and the error check + abort happens before any
further computation uses the synced data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 12:39:29 +08:00
ed1d86ade9 Merge paired MPI_Allreduce error checks to reduce global sync barriers
In the two Step() functions that handle both Patch and Shell Patch,
defer the Patch error check until after Shell Patch computation completes,
then perform a single combined MPI_Allreduce instead of two separate ones.
This eliminates 4 MPI_Allreduce calls per timestep (2 per Step function,
Predictor + Corrector phases each). The optimization is mathematically
equivalent: in normal execution (no NaN) behavior is identical; on error,
both Patch and Shell data are dumped before MPI_Abort.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 12:12:16 +08:00
471baa5065 PGO supported 2026-02-09 10:59:26 +08:00
4bb6c03013 makefile setting updated 2026-02-08 16:14:43 +08:00
b8e41b2b39 Only enable OpenMP for TwoPunctures 2026-02-08 13:00:37 +08:00
133e4f13a2 Use OpenMP's parallel for with schedule(dynamic,1) 2026-02-07 19:48:24 +08:00
914c4f4791 Optimize memory allocation in JFD_times_dv
This should reduce the pressure on the memory allocator, indirectly improving caching behavior.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
2026-02-07 15:55:45 +08:00
28 changed files with 16689 additions and 14145 deletions

View File

@@ -1,447 +0,0 @@
##################################################################
##
## AMSS-NCKU ABE Test Program (Skip TwoPuncture if data exists)
## Modified from AMSS_NCKU_Program.py
## Author: Xiaoqu
## Modified: 2026/02/01
##
##################################################################
##################################################################
## Print program introduction
import print_information
print_information.print_program_introduction()
##################################################################
import AMSS_NCKU_Input as input_data
##################################################################
## Create directories to store program run data
import os
import shutil
import sys
import time
## Set the output directory according to the input file
File_directory = os.path.join(input_data.File_directory)
## Check if output directory exists and if TwoPuncture data is available
#skip_twopuncture = False
skip_twopuncture = True
output_directory = os.path.join(File_directory, "AMSS_NCKU_output")
binary_results_directory = os.path.join(output_directory, input_data.Output_directory)
if os.path.exists(File_directory):
print( " Output directory already exists." )
print()
'''
# Check if TwoPuncture initial data files exist
if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture"):
twopuncture_output = os.path.join(output_directory, "TwoPunctureABE")
input_par = os.path.join(output_directory, "input.par")
if os.path.exists(twopuncture_output) and os.path.exists(input_par):
print( " Found existing TwoPuncture initial data." )
print( " Do you want to skip TwoPuncture phase and reuse existing data?" )
print( " Input 'skip' to skip TwoPuncture and start ABE directly" )
print( " Input 'regenerate' to regenerate everything from scratch" )
print()
while True:
try:
inputvalue = input()
if ( inputvalue == "skip" ):
print( " Skipping TwoPuncture phase, will reuse existing initial data." )
print()
skip_twopuncture = True
break
elif ( inputvalue == "regenerate" ):
print( " Regenerating everything from scratch." )
print()
skip_twopuncture = False
break
else:
print( " Please input 'skip' or 'regenerate'." )
except ValueError:
print( " Please input 'skip' or 'regenerate'." )
else:
print( " TwoPuncture initial data not found, will regenerate everything." )
print()
'''
# If not skipping, remove and recreate directory
if not skip_twopuncture:
shutil.rmtree(File_directory, ignore_errors=True)
os.mkdir(File_directory)
os.mkdir(output_directory)
os.mkdir(binary_results_directory)
figure_directory = os.path.join(File_directory, "figure")
os.mkdir(figure_directory)
shutil.copy("AMSS_NCKU_Input.py", File_directory)
print( " Output directory has been regenerated." )
print()
else:
# Create fresh directory structure
os.mkdir(File_directory)
shutil.copy("AMSS_NCKU_Input.py", File_directory)
os.mkdir(output_directory)
os.mkdir(binary_results_directory)
figure_directory = os.path.join(File_directory, "figure")
os.mkdir(figure_directory)
print( " Output directory has been generated." )
print()
# Ensure figure directory exists
figure_directory = os.path.join(File_directory, "figure")
if not os.path.exists(figure_directory):
os.mkdir(figure_directory)
##################################################################
## Output related parameter information
import setup
## Print and save input parameter information
setup.print_input_data( File_directory )
if not skip_twopuncture:
setup.generate_AMSSNCKU_input()
setup.print_puncture_information()
##################################################################
## Generate AMSS-NCKU program input files based on the configured parameters
if not skip_twopuncture:
print()
print( " Generating the AMSS-NCKU input parfile for the ABE executable." )
print()
## Generate cgh-related input files from the grid information
import numerical_grid
numerical_grid.append_AMSSNCKU_cgh_input()
print()
print( " The input parfile for AMSS-NCKU C++ executable file ABE has been generated." )
print( " However, the input relevant to TwoPuncture need to be appended later." )
print()
##################################################################
## Plot the initial grid configuration
if not skip_twopuncture:
print()
print( " Schematically plot the numerical grid structure." )
print()
import numerical_grid
numerical_grid.plot_initial_grid()
##################################################################
## Generate AMSS-NCKU macro files according to the numerical scheme and parameters
if not skip_twopuncture:
print()
print( " Automatically generating the macro file for AMSS-NCKU C++ executable file ABE " )
print( " (Based on the finite-difference numerical scheme) " )
print()
import generate_macrodef
generate_macrodef.generate_macrodef_h()
print( " AMSS-NCKU macro file macrodef.h has been generated. " )
generate_macrodef.generate_macrodef_fh()
print( " AMSS-NCKU macro file macrodef.fh has been generated. " )
##################################################################
# Compile the AMSS-NCKU program according to user requirements
# NOTE: ABE compilation is always performed, even when skipping TwoPuncture
print()
print( " Preparing to compile and run the AMSS-NCKU code as requested " )
print( " Compiling the AMSS-NCKU code based on the generated macro files " )
print()
AMSS_NCKU_source_path = "AMSS_NCKU_source"
AMSS_NCKU_source_copy = os.path.join(File_directory, "AMSS_NCKU_source_copy")
## If AMSS_NCKU source folder is missing, create it and prompt the user
if not os.path.exists(AMSS_NCKU_source_path):
os.makedirs(AMSS_NCKU_source_path)
print( " The AMSS-NCKU source files are incomplete; copy all source files into ./AMSS_NCKU_source. " )
print( " Press Enter to continue. " )
inputvalue = input()
# Copy AMSS-NCKU source files to prepare for compilation
# If skipping TwoPuncture and source_copy already exists, remove it first
if skip_twopuncture and os.path.exists(AMSS_NCKU_source_copy):
shutil.rmtree(AMSS_NCKU_source_copy)
shutil.copytree(AMSS_NCKU_source_path, AMSS_NCKU_source_copy)
# Copy the generated macro files into the AMSS_NCKU source folder
if not skip_twopuncture:
macrodef_h_path = os.path.join(File_directory, "macrodef.h")
macrodef_fh_path = os.path.join(File_directory, "macrodef.fh")
else:
# When skipping TwoPuncture, use existing macro files from previous run
macrodef_h_path = os.path.join(File_directory, "macrodef.h")
macrodef_fh_path = os.path.join(File_directory, "macrodef.fh")
shutil.copy2(macrodef_h_path, AMSS_NCKU_source_copy)
shutil.copy2(macrodef_fh_path, AMSS_NCKU_source_copy)
# Compile related programs
import makefile_and_run
## Change working directory to the target source copy
os.chdir(AMSS_NCKU_source_copy)
## Build the main AMSS-NCKU executable (ABE or ABEGPU)
makefile_and_run.makefile_ABE()
## If the initial-data method is Ansorg-TwoPuncture, build the TwoPunctureABE executable
## Only build TwoPunctureABE if not skipping TwoPuncture phase
if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ) and not skip_twopuncture:
makefile_and_run.makefile_TwoPunctureABE()
## Change current working directory back up two levels
os.chdir('..')
os.chdir('..')
print()
##################################################################
## Copy the AMSS-NCKU executable (ABE/ABEGPU) to the run directory
if (input_data.GPU_Calculation == "no"):
ABE_file = os.path.join(AMSS_NCKU_source_copy, "ABE")
elif (input_data.GPU_Calculation == "yes"):
ABE_file = os.path.join(AMSS_NCKU_source_copy, "ABEGPU")
if not os.path.exists( ABE_file ):
print()
print( " Lack of AMSS-NCKU executable file ABE/ABEGPU; recompile AMSS_NCKU_source manually. " )
print( " When recompilation is finished, press Enter to continue. " )
inputvalue = input()
## Copy the executable ABE (or ABEGPU) into the run directory
shutil.copy2(ABE_file, output_directory)
## If the initial-data method is TwoPuncture, copy the TwoPunctureABE executable to the run directory
## Only copy TwoPunctureABE if not skipping TwoPuncture phase
if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ) and not skip_twopuncture:
TwoPuncture_file = os.path.join(AMSS_NCKU_source_copy, "TwoPunctureABE")
if not os.path.exists( TwoPuncture_file ):
print()
print( " Lack of AMSS-NCKU executable file TwoPunctureABE; recompile TwoPunctureABE in AMSS_NCKU_source. " )
print( " When recompilation is finished, press Enter to continue. " )
inputvalue = input()
## Copy the TwoPunctureABE executable into the run directory
shutil.copy2(TwoPuncture_file, output_directory)
##################################################################
## If the initial-data method is TwoPuncture, generate the TwoPuncture input files
if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ) and not skip_twopuncture:
print()
print( " Initial data is chosen as Ansorg-TwoPuncture" )
print()
print()
print( " Automatically generating the input parfile for the TwoPunctureABE executable " )
print()
import generate_TwoPuncture_input
generate_TwoPuncture_input.generate_AMSSNCKU_TwoPuncture_input()
print()
print( " The input parfile for the TwoPunctureABE executable has been generated. " )
print()
## Generated AMSS-NCKU TwoPuncture input filename
AMSS_NCKU_TwoPuncture_inputfile = 'AMSS-NCKU-TwoPuncture.input'
AMSS_NCKU_TwoPuncture_inputfile_path = os.path.join( File_directory, AMSS_NCKU_TwoPuncture_inputfile )
## Copy and rename the file
shutil.copy2( AMSS_NCKU_TwoPuncture_inputfile_path, os.path.join(output_directory, 'TwoPunctureinput.par') )
## Run TwoPuncture to generate initial-data files
start_time = time.time() # Record start time
print()
print()
## Change to the output (run) directory
os.chdir(output_directory)
## Run the TwoPuncture executable
import makefile_and_run
makefile_and_run.run_TwoPunctureABE()
## Change current working directory back up two levels
os.chdir('..')
os.chdir('..')
elif (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ) and skip_twopuncture:
print()
print( " Skipping TwoPuncture execution, using existing initial data." )
print()
start_time = time.time() # Record start time for ABE only
else:
start_time = time.time() # Record start time
##################################################################
## Update puncture data based on TwoPuncture run results
if not skip_twopuncture:
import renew_puncture_parameter
renew_puncture_parameter.append_AMSSNCKU_BSSN_input(File_directory, output_directory)
## Generated AMSS-NCKU input filename
AMSS_NCKU_inputfile = 'AMSS-NCKU.input'
AMSS_NCKU_inputfile_path = os.path.join(File_directory, AMSS_NCKU_inputfile)
## Copy and rename the file
shutil.copy2( AMSS_NCKU_inputfile_path, os.path.join(output_directory, 'input.par') )
print()
print( " Successfully copy all AMSS-NCKU input parfile to target dictionary. " )
print()
else:
print()
print( " Using existing input.par file from previous run." )
print()
##################################################################
## Launch the AMSS-NCKU program
print()
print()
## Change to the run directory
os.chdir( output_directory )
import makefile_and_run
makefile_and_run.run_ABE()
## Change current working directory back up two levels
os.chdir('..')
os.chdir('..')
end_time = time.time()
elapsed_time = end_time - start_time
##################################################################
## Copy some basic input and log files out to facilitate debugging
## Path to the file that stores calculation settings
AMSS_NCKU_error_file_path = os.path.join(binary_results_directory, "setting.par")
## Copy and rename the file for easier inspection
shutil.copy( AMSS_NCKU_error_file_path, os.path.join(output_directory, "AMSSNCKU_setting_parameter") )
## Path to the error log file
AMSS_NCKU_error_file_path = os.path.join(binary_results_directory, "Error.log")
## Copy and rename the error log
shutil.copy( AMSS_NCKU_error_file_path, os.path.join(output_directory, "Error.log") )
## Primary program outputs
AMSS_NCKU_BH_data = os.path.join(binary_results_directory, "bssn_BH.dat" )
AMSS_NCKU_ADM_data = os.path.join(binary_results_directory, "bssn_ADMQs.dat" )
AMSS_NCKU_psi4_data = os.path.join(binary_results_directory, "bssn_psi4.dat" )
AMSS_NCKU_constraint_data = os.path.join(binary_results_directory, "bssn_constraint.dat")
## copy and rename the file
shutil.copy( AMSS_NCKU_BH_data, os.path.join(output_directory, "bssn_BH.dat" ) )
shutil.copy( AMSS_NCKU_ADM_data, os.path.join(output_directory, "bssn_ADMQs.dat" ) )
shutil.copy( AMSS_NCKU_psi4_data, os.path.join(output_directory, "bssn_psi4.dat" ) )
shutil.copy( AMSS_NCKU_constraint_data, os.path.join(output_directory, "bssn_constraint.dat") )
## Additional program outputs
if (input_data.Equation_Class == "BSSN-EM"):
AMSS_NCKU_phi1_data = os.path.join(binary_results_directory, "bssn_phi1.dat" )
AMSS_NCKU_phi2_data = os.path.join(binary_results_directory, "bssn_phi2.dat" )
shutil.copy( AMSS_NCKU_phi1_data, os.path.join(output_directory, "bssn_phi1.dat" ) )
shutil.copy( AMSS_NCKU_phi2_data, os.path.join(output_directory, "bssn_phi2.dat" ) )
elif (input_data.Equation_Class == "BSSN-EScalar"):
AMSS_NCKU_maxs_data = os.path.join(binary_results_directory, "bssn_maxs.dat" )
shutil.copy( AMSS_NCKU_maxs_data, os.path.join(output_directory, "bssn_maxs.dat" ) )
##################################################################
## Plot the AMSS-NCKU program results
print()
print( " Plotting the txt and binary results data from the AMSS-NCKU simulation " )
print()
import plot_xiaoqu
import plot_GW_strain_amplitude_xiaoqu
## Plot black hole trajectory
plot_xiaoqu.generate_puncture_orbit_plot( binary_results_directory, figure_directory )
plot_xiaoqu.generate_puncture_orbit_plot3D( binary_results_directory, figure_directory )
## Plot black hole separation vs. time
plot_xiaoqu.generate_puncture_distence_plot( binary_results_directory, figure_directory )
## Plot gravitational waveforms (psi4 and strain amplitude)
for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_gravitational_wave_psi4_plot( binary_results_directory, figure_directory, i )
plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot( binary_results_directory, figure_directory, i )
## Plot ADM mass evolution
for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_ADMmass_plot( binary_results_directory, figure_directory, i )
## Plot Hamiltonian constraint violation over time
for i in range(input_data.grid_level):
plot_xiaoqu.generate_constraint_check_plot( binary_results_directory, figure_directory, i )
## Plot stored binary data
plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory )
print()
print( f" This Program Cost = {elapsed_time} Seconds " )
print()
##################################################################
print()
print( " The AMSS-NCKU-Python simulation is successfully finished, thanks for using !!! " )
print()
##################################################################

View File

@@ -8,6 +8,14 @@
## ##
################################################################## ##################################################################
## Guard against re-execution by multiprocessing child processes.
## Without this, using 'spawn' or 'forkserver' context would cause every
## worker to re-run the entire script, spawning exponentially more
## workers (fork bomb).
if __name__ != '__main__':
import sys as _sys
_sys.exit(0)
################################################################## ##################################################################
@@ -58,7 +66,8 @@ if os.path.exists(File_directory):
## Prompt whether to overwrite the existing directory ## Prompt whether to overwrite the existing directory
while True: while True:
try: try:
inputvalue = input() ## inputvalue = input()
inputvalue = "continue"
## If the user agrees to overwrite, proceed and remove the existing directory ## If the user agrees to overwrite, proceed and remove the existing directory
if ( inputvalue == "continue" ): if ( inputvalue == "continue" ):
print( " Continue the calculation !!! " ) print( " Continue the calculation !!! " )
@@ -424,26 +433,31 @@ print(
import plot_xiaoqu import plot_xiaoqu
import plot_GW_strain_amplitude_xiaoqu import plot_GW_strain_amplitude_xiaoqu
from parallel_plot_helper import run_plot_tasks_parallel
plot_tasks = []
## Plot black hole trajectory ## Plot black hole trajectory
plot_xiaoqu.generate_puncture_orbit_plot( binary_results_directory, figure_directory ) plot_tasks.append( ( plot_xiaoqu.generate_puncture_orbit_plot, (binary_results_directory, figure_directory) ) )
plot_xiaoqu.generate_puncture_orbit_plot3D( binary_results_directory, figure_directory ) plot_tasks.append( ( plot_xiaoqu.generate_puncture_orbit_plot3D, (binary_results_directory, figure_directory) ) )
## Plot black hole separation vs. time ## Plot black hole separation vs. time
plot_xiaoqu.generate_puncture_distence_plot( binary_results_directory, figure_directory ) plot_tasks.append( ( plot_xiaoqu.generate_puncture_distence_plot, (binary_results_directory, figure_directory) ) )
## Plot gravitational waveforms (psi4 and strain amplitude) ## Plot gravitational waveforms (psi4 and strain amplitude)
for i in range(input_data.Detector_Number): for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_gravitational_wave_psi4_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_xiaoqu.generate_gravitational_wave_psi4_plot, (binary_results_directory, figure_directory, i) ) )
plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot, (binary_results_directory, figure_directory, i) ) )
## Plot ADM mass evolution ## Plot ADM mass evolution
for i in range(input_data.Detector_Number): for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_ADMmass_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_xiaoqu.generate_ADMmass_plot, (binary_results_directory, figure_directory, i) ) )
## Plot Hamiltonian constraint violation over time ## Plot Hamiltonian constraint violation over time
for i in range(input_data.grid_level): for i in range(input_data.grid_level):
plot_xiaoqu.generate_constraint_check_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_xiaoqu.generate_constraint_check_plot, (binary_results_directory, figure_directory, i) ) )
run_plot_tasks_parallel(plot_tasks)
## Plot stored binary data ## Plot stored binary data
plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory ) plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory )

File diff suppressed because it is too large Load Diff

View File

@@ -39,6 +39,10 @@ public:
bool Find_Point(double *XX); bool Find_Point(double *XX);
void Interp_Points(MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry,
int Nmin_consumer, int Nmax_consumer);
void Interp_Points(MyList<var> *VarList, void Interp_Points(MyList<var> *VarList,
int NN, double **XX, int NN, double **XX,
double *Shellf, int Symmetry, MPI_Comm Comm_here); double *Shellf, int Symmetry, MPI_Comm Comm_here);

View File

@@ -24,6 +24,7 @@ using namespace std;
#endif #endif
#include <mpi.h> #include <mpi.h>
#include <memory.h>
#include "MyList.h" #include "MyList.h"
#include "Block.h" #include "Block.h"
#include "Parallel.h" #include "Parallel.h"

File diff suppressed because it is too large Load Diff

View File

@@ -1,167 +1,235 @@
#ifndef PARALLEL_H #ifndef PARALLEL_H
#define PARALLEL_H #define PARALLEL_H
#include <iostream> #include <iostream>
#include <iomanip> #include <iomanip>
#include <fstream> #include <fstream>
#include <cstdlib> #include <cstdlib>
#include <cstdio> #include <cstdio>
#include <string> #include <string>
#include <cmath> #include <cmath>
#include <new> #include <new>
using namespace std; using namespace std;
#include <memory.h>
#include "Parallel_bam.h" #include "Parallel_bam.h"
#include "var.h" #include "var.h"
#include "MPatch.h" #include "MPatch.h"
#include "Block.h" #include "Block.h"
#include "MyList.h" #include "MyList.h"
#include "macrodef.h" //need dim; ghost_width; CONTRACT #include "macrodef.h" //need dim; ghost_width; CONTRACT
namespace Parallel namespace Parallel
{ {
struct gridseg struct gridseg
{ {
double llb[dim]; double llb[dim];
double uub[dim]; double uub[dim];
int shape[dim]; int shape[dim];
double illb[dim], iuub[dim]; // only use for OutBdLow2Hi double illb[dim], iuub[dim]; // only use for OutBdLow2Hi
Block *Bg; Block *Bg;
}; };
int partition1(int &nx, int split_size, int min_width, int cpusize, int shape); // special for 1 diemnsion int partition1(int &nx, int split_size, int min_width, int cpusize, int shape); // special for 1 diemnsion
int partition2(int *nxy, int split_size, int *min_width, int cpusize, int *shape); // special for 2 diemnsions int partition2(int *nxy, int split_size, int *min_width, int cpusize, int *shape); // special for 2 diemnsions
int partition3(int *nxyz, int split_size, int *min_width, int cpusize, int *shape); int partition3(int *nxyz, int split_size, int *min_width, int cpusize, int *shape);
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfs, bool periodic, int nodes = 0); // produce corresponding Blocks MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfs, bool periodic, int nodes = 0); // produce corresponding Blocks
void KillBlocks(MyList<Patch> *PatchLIST); MyList<Block> *distribute_hard(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfs, bool periodic, int nodes = 0); // produce corresponding Blocks
Block* splitHotspotBlock(MyList<Block>* &BlL, int _dim,
void setfunction(MyList<Block> *BlL, var *vn, double func(double x, double y, double z)); int ib0_orig, int ib3_orig,
void setfunction(int rank, MyList<Block> *BlL, var *vn, double func(double x, double y, double z)); int jb1_orig, int jb4_orig,
void writefile(double time, int nx, int ny, int nz, double xmin, double xmax, double ymin, double ymax, int kb2_orig, int kb5_orig,
double zmin, double zmax, char *filename, double *data_out); Patch* PP, int r_left, int r_right,
void writefile(double time, int nx, int ny, double xmin, double xmax, double ymin, double ymax, int ingfsi, int fngfsi, bool periodic,
char *filename, double *datain); Block* &split_first_block, Block* &split_last_block);
void getarrayindex(int DIM, int *shape, int *index, int n); Block* createMappedBlock(MyList<Block>* &BlL, int _dim, int* shape, double* bbox,
int getarraylocation(int DIM, int *shape, int *index); int block_id, int ingfsi, int fngfsi, int lev);
void copy(int DIM, double *llbout, double *uubout, int *Dshape, double *DD, double *llbin, double *uubin, void KillBlocks(MyList<Patch> *PatchLIST);
int *shape, double *datain, double *llb, double *uub);
void Dump_CPU_Data(MyList<Block> *BlL, MyList<var> *DumpList, char *tag, double time, double dT); void setfunction(MyList<Block> *BlL, var *vn, double func(double x, double y, double z));
void Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT); void setfunction(int rank, MyList<Block> *BlL, var *vn, double func(double x, double y, double z));
void Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd); void writefile(double time, int nx, int ny, int nz, double xmin, double xmax, double ymin, double ymax,
double *Collect_Data(Patch *PP, var *VP); double zmin, double zmax, char *filename, double *data_out);
void d2Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT); void writefile(double time, int nx, int ny, double xmin, double xmax, double ymin, double ymax,
void d2Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd); char *filename, double *datain);
void Dump_Data0(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT); void getarrayindex(int DIM, int *shape, int *index, int n);
double global_interp(int DIM, int *ext, double **CoX, double *datain, int getarraylocation(int DIM, int *shape, int *index);
double *poX, int ordn, double *SoA, int Symmetry); void copy(int DIM, double *llbout, double *uubout, int *Dshape, double *DD, double *llbin, double *uubin,
double global_interp(int DIM, int *ext, double **CoX, double *datain, int *shape, double *datain, double *llb, double *uub);
double *poX, int ordn); void Dump_CPU_Data(MyList<Block> *BlL, MyList<var> *DumpList, char *tag, double time, double dT);
double Lagrangian_Int(double x, int npts, double *xpts, double *funcvals); void Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT);
double LagrangePoly(double x, int pt, int npts, double *xpts); void Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd);
MyList<gridseg> *build_complete_gsl(Patch *Pat); double *Collect_Data(Patch *PP, var *VP);
MyList<gridseg> *build_complete_gsl(MyList<Patch> *PatL); void d2Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT);
MyList<gridseg> *build_complete_gsl_virtual(MyList<Patch> *PatL); void d2Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd);
MyList<gridseg> *build_complete_gsl_virtual2(MyList<Patch> *PatL); // - buffer void Dump_Data0(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT);
MyList<gridseg> *build_owned_gsl0(Patch *Pat, int rank_in); // - ghost without extension, special for Sync usage double global_interp(int DIM, int *ext, double **CoX, double *datain,
MyList<gridseg> *build_owned_gsl1(Patch *Pat, int rank_in); // - ghost, similar to build_owned_gsl0 but extend one point on left side for vertex grid double *poX, int ordn, double *SoA, int Symmetry);
MyList<gridseg> *build_owned_gsl2(Patch *Pat, int rank_in); // - buffer - ghost double global_interp(int DIM, int *ext, double **CoX, double *datain,
MyList<gridseg> *build_owned_gsl3(Patch *Pat, int rank_in, int Symmetry); // - ghost - BD ghost double *poX, int ordn);
MyList<gridseg> *build_owned_gsl4(Patch *Pat, int rank_in, int Symmetry); // - buffer - ghost - BD ghost double Lagrangian_Int(double x, int npts, double *xpts, double *funcvals);
MyList<gridseg> *build_owned_gsl5(Patch *Pat, int rank_in); // similar to build_owned_gsl2 but no extension double LagrangePoly(double x, int pt, int npts, double *xpts);
MyList<gridseg> *build_owned_gsl(MyList<Patch> *PatL, int rank_in, int type, int Symmetry); MyList<gridseg> *build_complete_gsl(Patch *Pat);
void build_gstl(MyList<gridseg> *srci, MyList<gridseg> *dsti, MyList<gridseg> **out_src, MyList<gridseg> **out_dst); MyList<gridseg> *build_complete_gsl(MyList<Patch> *PatL);
int data_packer(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir, MyList<gridseg> *build_complete_gsl_virtual(MyList<Patch> *PatL);
MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry); MyList<gridseg> *build_complete_gsl_virtual2(MyList<Patch> *PatL); // - buffer
void transfer(MyList<gridseg> **src, MyList<gridseg> **dst, MyList<gridseg> *build_owned_gsl0(Patch *Pat, int rank_in); // - ghost without extension, special for Sync usage
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */, MyList<gridseg> *build_owned_gsl1(Patch *Pat, int rank_in); // - ghost, similar to build_owned_gsl0 but extend one point on left side for vertex grid
int Symmetry); MyList<gridseg> *build_owned_gsl2(Patch *Pat, int rank_in); // - buffer - ghost
int data_packermix(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir, MyList<gridseg> *build_owned_gsl3(Patch *Pat, int rank_in, int Symmetry); // - ghost - BD ghost
MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry); MyList<gridseg> *build_owned_gsl4(Patch *Pat, int rank_in, int Symmetry); // - buffer - ghost - BD ghost
void transfermix(MyList<gridseg> **src, MyList<gridseg> **dst, MyList<gridseg> *build_owned_gsl5(Patch *Pat, int rank_in); // similar to build_owned_gsl2 but no extension
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */, MyList<gridseg> *build_owned_gsl(MyList<Patch> *PatL, int rank_in, int type, int Symmetry);
int Symmetry); void build_gstl(MyList<gridseg> *srci, MyList<gridseg> *dsti, MyList<gridseg> **out_src, MyList<gridseg> **out_dst);
void Sync(Patch *Pat, MyList<var> *VarList, int Symmetry); int data_packer(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir,
void Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry); MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry);
void OutBdLow2Hi(Patch *Patc, Patch *Patf, void transfer(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */,
int Symmetry); int Symmetry);
void OutBdLow2Hi(MyList<Patch> *PatcL, MyList<Patch> *PatfL, int data_packermix(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry);
int Symmetry); void transfermix(MyList<gridseg> **src, MyList<gridseg> **dst,
void OutBdLow2Himix(Patch *Patc, Patch *Patf, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, int Symmetry);
int Symmetry); void Sync(Patch *Pat, MyList<var> *VarList, int Symmetry);
void OutBdLow2Himix(MyList<Patch> *PatcL, MyList<Patch> *PatfL, void Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, void Sync_merged(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
int Symmetry);
void Prolong(Patch *Patc, Patch *Patf, struct SyncCache {
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, bool valid;
int Symmetry); int cpusize;
void Prolongint(Patch *Patc, Patch *Patf, MyList<gridseg> **combined_src;
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, MyList<gridseg> **combined_dst;
int Symmetry); int *send_lengths;
void Restrict(MyList<Patch> *PatcL, MyList<Patch> *PatfL, int *recv_lengths;
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, double **send_bufs;
int Symmetry); double **recv_bufs;
void Restrict_after(MyList<Patch> *PatcL, MyList<Patch> *PatfL, int *send_buf_caps;
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, int *recv_buf_caps;
int Symmetry); // for -ghost - BDghost MPI_Request *reqs;
MyList<Parallel::gridseg> *build_PhysBD_gsl(Patch *Pat); MPI_Status *stats;
MyList<Parallel::gridseg> *build_ghost_gsl(MyList<Patch> *PatL); int max_reqs;
MyList<Parallel::gridseg> *build_ghost_gsl(Patch *Pat); bool lengths_valid;
MyList<Parallel::gridseg> *build_buffer_gsl(Patch *Pat); SyncCache();
MyList<Parallel::gridseg> *build_buffer_gsl(MyList<Patch> *PatL); void invalidate();
MyList<Parallel::gridseg> *gsl_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B); void destroy();
MyList<Parallel::gridseg> *gs_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B); };
MyList<Parallel::gridseg> *gsl_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B); void Sync_cached(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry, SyncCache &cache);
MyList<Parallel::gridseg> *clone_gsl(MyList<Parallel::gridseg> *p, bool first_only); void transfer_cached(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<Parallel::gridseg> *build_bulk_gsl(Patch *Pat); // similar to build_owned_gsl0 but does not care rank issue MyList<var> *VarList1, MyList<var> *VarList2,
MyList<Parallel::gridseg> *build_bulk_gsl(Block *bp, Patch *Pat); int Symmetry, SyncCache &cache);
void build_PhysBD_gstl(Patch *Pat, MyList<Parallel::gridseg> *srci, MyList<Parallel::gridseg> *dsti,
MyList<Parallel::gridseg> **out_src, MyList<Parallel::gridseg> **out_dst); struct AsyncSyncState {
void PeriodicBD(Patch *Pat, MyList<var> *VarList, int Symmetry); int req_no;
double L2Norm(Patch *Pat, var *vf); bool active;
void checkgsl(MyList<Parallel::gridseg> *pp, bool first_only); AsyncSyncState() : req_no(0), active(false) {}
void checkvarl(MyList<var> *pp, bool first_only); };
MyList<Parallel::gridseg> *divide_gsl(MyList<Parallel::gridseg> *p, Patch *Pat);
MyList<Parallel::gridseg> *divide_gs(MyList<Parallel::gridseg> *p, Patch *Pat); void Sync_start(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry,
void prepare_inter_time_level(Patch *Pat, SyncCache &cache, AsyncSyncState &state);
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */, void Sync_finish(SyncCache &cache, AsyncSyncState &state,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex); MyList<var> *VarList, int Symmetry);
void prepare_inter_time_level(Patch *Pat, void OutBdLow2Hi(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex); int Symmetry);
void prepare_inter_time_level(MyList<Patch> *PatL, void OutBdLow2Hi(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex); int Symmetry);
void prepare_inter_time_level(MyList<Patch> *Pat, void OutBdLow2Himix(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex); int Symmetry);
void merge_gsl(MyList<gridseg> *&A, const double ratio); void OutBdLow2Himix(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
bool merge_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C, const double ratio); MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
// Add ghost region to tangent plane int Symmetry);
// we assume the grids have the same resolution void Restrict_cached(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
void add_ghost_touch(MyList<gridseg> *&A); MyList<var> *VarList1, MyList<var> *VarList2,
void cut_gsl(MyList<gridseg> *&A); int Symmetry, SyncCache &cache);
bool cut_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C); void OutBdLow2Hi_cached(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<Parallel::gridseg> *gs_subtract_virtual(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B); MyList<var> *VarList1, MyList<var> *VarList2,
void fill_level_data(MyList<Patch> *PatLd, MyList<Patch> *PatLs, MyList<Patch> *PatcL, int Symmetry, SyncCache &cache);
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *FutureList, void OutBdLow2Himix_cached(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *tmList, int Symmetry, bool BB, bool CC); MyList<var> *VarList1, MyList<var> *VarList2,
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry, SyncCache &cache);
int NN, double **XX, void Prolong(Patch *Patc, Patch *Patf,
double *Shellf, int Symmetry); MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
void aligncheck(double *bbox0, double *bboxl, int lev, double *DH0, int *shape); int Symmetry);
bool point_locat_gsl(double *pox, MyList<Parallel::gridseg> *gsl); void Prolongint(Patch *Patc, Patch *Patf,
void checkpatchlist(MyList<Patch> *PatL, bool buflog); MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
double L2Norm(Patch *Pat, var *vf, MPI_Comm Comm_here); void Restrict(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int NN, double **XX, int Symmetry);
double *Shellf, int Symmetry, MPI_Comm Comm_here); void Restrict_after(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
#if (PSTR == 1 || PSTR == 2 || PSTR == 3) MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfsi, int Symmetry); // for -ghost - BDghost
bool periodic, int start_rank, int end_rank, int nodes = 0); MyList<Parallel::gridseg> *build_PhysBD_gsl(Patch *Pat);
#endif MyList<Parallel::gridseg> *build_ghost_gsl(MyList<Patch> *PatL);
} MyList<Parallel::gridseg> *build_ghost_gsl(Patch *Pat);
#endif /*PARALLEL_H */ MyList<Parallel::gridseg> *build_buffer_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_buffer_gsl(MyList<Patch> *PatL);
MyList<Parallel::gridseg> *gsl_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gsl_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *clone_gsl(MyList<Parallel::gridseg> *p, bool first_only);
MyList<Parallel::gridseg> *build_bulk_gsl(Patch *Pat); // similar to build_owned_gsl0 but does not care rank issue
MyList<Parallel::gridseg> *build_bulk_gsl(Block *bp, Patch *Pat);
void build_PhysBD_gstl(Patch *Pat, MyList<Parallel::gridseg> *srci, MyList<Parallel::gridseg> *dsti,
MyList<Parallel::gridseg> **out_src, MyList<Parallel::gridseg> **out_dst);
void PeriodicBD(Patch *Pat, MyList<var> *VarList, int Symmetry);
double L2Norm(Patch *Pat, var *vf);
void checkgsl(MyList<Parallel::gridseg> *pp, bool first_only);
void checkvarl(MyList<var> *pp, bool first_only);
MyList<Parallel::gridseg> *divide_gsl(MyList<Parallel::gridseg> *p, Patch *Pat);
MyList<Parallel::gridseg> *divide_gs(MyList<Parallel::gridseg> *p, Patch *Pat);
void prepare_inter_time_level(Patch *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(Patch *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(MyList<Patch> *PatL,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(MyList<Patch> *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex);
void merge_gsl(MyList<gridseg> *&A, const double ratio);
bool merge_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C, const double ratio);
// Add ghost region to tangent plane
// we assume the grids have the same resolution
void add_ghost_touch(MyList<gridseg> *&A);
void cut_gsl(MyList<gridseg> *&A);
bool cut_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C);
MyList<Parallel::gridseg> *gs_subtract_virtual(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
void fill_level_data(MyList<Patch> *PatLd, MyList<Patch> *PatLs, MyList<Patch> *PatcL,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *FutureList,
MyList<var> *tmList, int Symmetry, bool BB, bool CC);
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry);
void aligncheck(double *bbox0, double *bboxl, int lev, double *DH0, int *shape);
bool point_locat_gsl(double *pox, MyList<Parallel::gridseg> *gsl);
void checkpatchlist(MyList<Patch> *PatL, bool buflog);
double L2Norm(Patch *Pat, var *vf, MPI_Comm Comm_here);
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry, MPI_Comm Comm_here);
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfsi,
bool periodic, int start_rank, int end_rank, int nodes = 0);
// Redistribute blocks with time statistics for load balancing
MyList<Block> *distribute(MyList<Patch> *PatchLIST, MyList<Block> *OldBlockL,
int cpusize, int ingfsi, int fngfsi,
bool periodic, int start_rank, int end_rank, int nodes = 0);
#endif
// Dynamic load balancing: split blocks for heavy ranks
void split_heavy_blocks(MyList<Patch> *PatL, int *heavy_ranks, int num_heavy,
int split_factor, int cpusize, int ingfsi, int fngfsi);
// Check if load balancing is needed based on interpolation times
bool check_load_balance_need(double *rank_times, int nprocs, int &num_heavy, int *heavy_ranks);
}
#endif /*PARALLEL_H */

View File

@@ -730,6 +730,12 @@ void bssn_class::Initialize()
PhysTime = StartTime; PhysTime = StartTime;
Setup_Black_Hole_position(); Setup_Black_Hole_position();
} }
// Initialize sync caches (per-level, for predictor and corrector)
sync_cache_pre = new Parallel::SyncCache[GH->levels];
sync_cache_cor = new Parallel::SyncCache[GH->levels];
sync_cache_rp_coarse = new Parallel::SyncCache[GH->levels];
sync_cache_rp_fine = new Parallel::SyncCache[GH->levels];
} }
//================================================================================================ //================================================================================================
@@ -981,6 +987,32 @@ bssn_class::~bssn_class()
delete Azzz; delete Azzz;
#endif #endif
// Destroy sync caches before GH
if (sync_cache_pre)
{
for (int i = 0; i < GH->levels; i++)
sync_cache_pre[i].destroy();
delete[] sync_cache_pre;
}
if (sync_cache_cor)
{
for (int i = 0; i < GH->levels; i++)
sync_cache_cor[i].destroy();
delete[] sync_cache_cor;
}
if (sync_cache_rp_coarse)
{
for (int i = 0; i < GH->levels; i++)
sync_cache_rp_coarse[i].destroy();
delete[] sync_cache_rp_coarse;
}
if (sync_cache_rp_fine)
{
for (int i = 0; i < GH->levels; i++)
sync_cache_rp_fine[i].destroy();
delete[] sync_cache_rp_fine;
}
delete GH; delete GH;
#ifdef WithShell #ifdef WithShell
delete SH; delete SH;
@@ -2181,6 +2213,7 @@ void bssn_class::Evolve(int Steps)
GH->Regrid(Symmetry, BH_num, Porgbr, Porg0, GH->Regrid(Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_mon, StartTime, dT_mon / 2), ErrorMonitor); fgt(PhysTime - dT_mon, StartTime, dT_mon / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
#endif #endif
#if (REGLEV == 0 && (PSTR == 1 || PSTR == 2)) #if (REGLEV == 0 && (PSTR == 1 || PSTR == 2))
@@ -2396,6 +2429,7 @@ void bssn_class::RecursiveStep(int lev)
GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0, GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor); fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
#endif #endif
} }
@@ -2574,6 +2608,7 @@ void bssn_class::ParallelStep()
GH->Regrid_Onelevel(GH->mylev, Symmetry, BH_num, Porgbr, Porg0, GH->Regrid_Onelevel(GH->mylev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor); fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
#endif #endif
} }
@@ -2740,6 +2775,7 @@ void bssn_class::ParallelStep()
GH->Regrid_Onelevel(lev + 1, Symmetry, BH_num, Porgbr, Porg0, GH->Regrid_Onelevel(lev + 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_levp1, StartTime, dT_levp1 / 2), ErrorMonitor); fgt(PhysTime - dT_levp1, StartTime, dT_levp1 / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2754,6 +2790,7 @@ void bssn_class::ParallelStep()
GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0, GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor); fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2772,6 +2809,7 @@ void bssn_class::ParallelStep()
GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0, GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor); fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2787,6 +2825,7 @@ void bssn_class::ParallelStep()
GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0, GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor); fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -3158,21 +3197,7 @@ void bssn_class::Step(int lev, int YN)
} }
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // NOTE: error check deferred to after Shell Patch computation to reduce MPI_Allreduce calls
{
int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
}
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables at t = " << PhysTime << ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#ifdef WithShell #ifdef WithShell
// evolve Shell Patches // evolve Shell Patches
@@ -3190,9 +3215,9 @@ void bssn_class::Step(int lev, int YN)
{ {
#if (AGM == 0) #if (AGM == 0)
f_enforce_ga(cg->shape, f_enforce_ga(cg->shape,
cg->fgfs[gxx0->sgfn], cg->fgfs[gxy0->sgfn], cg->fgfs[gxz0->sgfn], cg->fgfs[gxx0->sgfn], cg->fgfs[gxy0->sgfn], cg->fgfs[gxz0->sgfn],
cg->fgfs[gyy0->sgfn], cg->fgfs[gyz0->sgfn], cg->fgfs[gzz0->sgfn], cg->fgfs[gyy0->sgfn], cg->fgfs[gyz0->sgfn], cg->fgfs[gzz0->sgfn],
cg->fgfs[Axx0->sgfn], cg->fgfs[Axy0->sgfn], cg->fgfs[Axz0->sgfn], cg->fgfs[Axx0->sgfn], cg->fgfs[Axy0->sgfn], cg->fgfs[Axz0->sgfn],
cg->fgfs[Ayy0->sgfn], cg->fgfs[Ayz0->sgfn], cg->fgfs[Azz0->sgfn]); cg->fgfs[Ayy0->sgfn], cg->fgfs[Ayz0->sgfn], cg->fgfs[Azz0->sgfn]);
#endif #endif
@@ -3316,25 +3341,16 @@ void bssn_class::Step(int lev, int YN)
#endif #endif
} }
// check error information // Non-blocking error reduction overlapped with Sync to hide Allreduce latency
MPI_Request err_req;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &err_req);
}
if (ERROR)
{
SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables on Shell Patches at t = " << PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry); Parallel::AsyncSyncState async_pre;
Parallel::Sync_start(GH->PatL[lev], SynchList_pre, Symmetry, sync_cache_pre[lev], async_pre);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -3347,12 +3363,29 @@ void bssn_class::Step(int lev, int YN)
{ {
prev_clock = curr_clock; prev_clock = curr_clock;
curr_clock = clock(); curr_clock = clock();
cout << " Shell stuff synchronization used " cout << " Shell stuff synchronization used "
<< (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC) << (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC)
<< " seconds! " << endl; << " seconds! " << endl;
} }
} }
#endif #endif
Parallel::Sync_finish(sync_cache_pre[lev], async_pre, SynchList_pre, Symmetry);
#ifdef WithShell
// Complete non-blocking error reduction and check
MPI_Wait(&err_req, MPI_STATUS_IGNORE);
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables at t = " << PhysTime << ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#endif
#if (MAPBH == 0) #if (MAPBH == 0)
// for black hole position // for black hole position
@@ -3528,24 +3561,7 @@ void bssn_class::Step(int lev, int YN)
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // NOTE: error check deferred to after Shell Patch computation to reduce MPI_Allreduce calls
{
int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
}
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime
<< ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#ifdef WithShell #ifdef WithShell
// evolve Shell Patches // evolve Shell Patches
@@ -3563,9 +3579,9 @@ void bssn_class::Step(int lev, int YN)
{ {
#if (AGM == 0) #if (AGM == 0)
f_enforce_ga(cg->shape, f_enforce_ga(cg->shape,
cg->fgfs[gxx->sgfn], cg->fgfs[gxy->sgfn], cg->fgfs[gxz->sgfn], cg->fgfs[gxx->sgfn], cg->fgfs[gxy->sgfn], cg->fgfs[gxz->sgfn],
cg->fgfs[gyy->sgfn], cg->fgfs[gyz->sgfn], cg->fgfs[gzz->sgfn], cg->fgfs[gyy->sgfn], cg->fgfs[gyz->sgfn], cg->fgfs[gzz->sgfn],
cg->fgfs[Axx->sgfn], cg->fgfs[Axy->sgfn], cg->fgfs[Axz->sgfn], cg->fgfs[Axx->sgfn], cg->fgfs[Axy->sgfn], cg->fgfs[Axz->sgfn],
cg->fgfs[Ayy->sgfn], cg->fgfs[Ayz->sgfn], cg->fgfs[Azz->sgfn]); cg->fgfs[Ayy->sgfn], cg->fgfs[Ayz->sgfn], cg->fgfs[Azz->sgfn]);
#elif (AGM == 1) #elif (AGM == 1)
if (iter_count == 3) if (iter_count == 3)
@@ -3685,26 +3701,16 @@ void bssn_class::Step(int lev, int YN)
sPp = sPp->next; sPp = sPp->next;
} }
} }
// check error information // Non-blocking error reduction overlapped with Sync to hide Allreduce latency
MPI_Request err_req_cor;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &err_req_cor);
}
if (ERROR)
{
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN on Shell Patches in RK4 substep#"
<< iter_count << " variables at t = "
<< PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); Parallel::AsyncSyncState async_cor;
Parallel::Sync_start(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_cor[lev], async_cor);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -3717,12 +3723,31 @@ void bssn_class::Step(int lev, int YN)
{ {
prev_clock = curr_clock; prev_clock = curr_clock;
curr_clock = clock(); curr_clock = clock();
cout << " Shell stuff synchronization used " cout << " Shell stuff synchronization used "
<< (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC) << (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC)
<< " seconds! " << endl; << " seconds! " << endl;
} }
} }
#endif #endif
Parallel::Sync_finish(sync_cache_cor[lev], async_cor, SynchList_cor, Symmetry);
#ifdef WithShell
// Complete non-blocking error reduction and check
MPI_Wait(&err_req_cor, MPI_STATUS_IGNORE);
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime
<< ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#endif
#if (MAPBH == 0) #if (MAPBH == 0)
// for black hole position // for black hole position
@@ -4034,22 +4059,7 @@ void bssn_class::Step(int lev, int YN)
} }
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // NOTE: error check deferred to after Shell Patch computation to reduce MPI_Allreduce calls
{
int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
}
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables at t = " << PhysTime
<< ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#ifdef WithShell #ifdef WithShell
// evolve Shell Patches // evolve Shell Patches
@@ -4067,15 +4077,15 @@ void bssn_class::Step(int lev, int YN)
{ {
#if (AGM == 0) #if (AGM == 0)
f_enforce_ga(cg->shape, f_enforce_ga(cg->shape,
cg->fgfs[gxx0->sgfn], cg->fgfs[gxy0->sgfn], cg->fgfs[gxz0->sgfn], cg->fgfs[gxx0->sgfn], cg->fgfs[gxy0->sgfn], cg->fgfs[gxz0->sgfn],
cg->fgfs[gyy0->sgfn], cg->fgfs[gyz0->sgfn], cg->fgfs[gzz0->sgfn], cg->fgfs[gyy0->sgfn], cg->fgfs[gyz0->sgfn], cg->fgfs[gzz0->sgfn],
cg->fgfs[Axx0->sgfn], cg->fgfs[Axy0->sgfn], cg->fgfs[Axz0->sgfn], cg->fgfs[Axx0->sgfn], cg->fgfs[Axy0->sgfn], cg->fgfs[Axz0->sgfn],
cg->fgfs[Ayy0->sgfn], cg->fgfs[Ayz0->sgfn], cg->fgfs[Azz0->sgfn]); cg->fgfs[Ayy0->sgfn], cg->fgfs[Ayz0->sgfn], cg->fgfs[Azz0->sgfn]);
#endif #endif
if (f_compute_rhs_bssn_ss(cg->shape, TRK4, cg->X[0], cg->X[1], cg->X[2], if (f_compute_rhs_bssn_ss(cg->shape, TRK4, cg->X[0], cg->X[1], cg->X[2],
cg->fgfs[fngfs + ShellPatch::gx], cg->fgfs[fngfs + ShellPatch::gx],
cg->fgfs[fngfs + ShellPatch::gy], cg->fgfs[fngfs + ShellPatch::gy],
cg->fgfs[fngfs + ShellPatch::gz], cg->fgfs[fngfs + ShellPatch::gz],
cg->fgfs[fngfs + ShellPatch::drhodx], cg->fgfs[fngfs + ShellPatch::drhodx],
cg->fgfs[fngfs + ShellPatch::drhody], cg->fgfs[fngfs + ShellPatch::drhody],
@@ -4190,25 +4200,16 @@ void bssn_class::Step(int lev, int YN)
} }
#endif #endif
} }
// check error information // Non-blocking error reduction overlapped with Sync to hide Allreduce latency
MPI_Request err_req;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &err_req);
}
if (ERROR)
{
SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables on Shell Patches at t = "
<< PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry); Parallel::AsyncSyncState async_pre;
Parallel::Sync_start(GH->PatL[lev], SynchList_pre, Symmetry, sync_cache_pre[lev], async_pre);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -4221,9 +4222,27 @@ void bssn_class::Step(int lev, int YN)
{ {
prev_clock = curr_clock; prev_clock = curr_clock;
curr_clock = clock(); curr_clock = clock();
cout << " Shell stuff synchronization used " cout << " Shell stuff synchronization used "
<< (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC) << (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC)
<< " seconds! " << endl; << " seconds! " << endl;
}
}
#endif
Parallel::Sync_finish(sync_cache_pre[lev], async_pre, SynchList_pre, Symmetry);
#ifdef WithShell
// Complete non-blocking error reduction and check
MPI_Wait(&err_req, MPI_STATUS_IGNORE);
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables at t = " << PhysTime
<< ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
} }
} }
#endif #endif
@@ -4386,23 +4405,7 @@ void bssn_class::Step(int lev, int YN)
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // NOTE: error check deferred to after Shell Patch computation to reduce MPI_Allreduce calls
{
int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
}
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime
<< ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#ifdef WithShell #ifdef WithShell
// evolve Shell Patches // evolve Shell Patches
@@ -4420,9 +4423,9 @@ void bssn_class::Step(int lev, int YN)
{ {
#if (AGM == 0) #if (AGM == 0)
f_enforce_ga(cg->shape, f_enforce_ga(cg->shape,
cg->fgfs[gxx->sgfn], cg->fgfs[gxy->sgfn], cg->fgfs[gxz->sgfn], cg->fgfs[gxx->sgfn], cg->fgfs[gxy->sgfn], cg->fgfs[gxz->sgfn],
cg->fgfs[gyy->sgfn], cg->fgfs[gyz->sgfn], cg->fgfs[gzz->sgfn], cg->fgfs[gyy->sgfn], cg->fgfs[gyz->sgfn], cg->fgfs[gzz->sgfn],
cg->fgfs[Axx->sgfn], cg->fgfs[Axy->sgfn], cg->fgfs[Axz->sgfn], cg->fgfs[Axx->sgfn], cg->fgfs[Axy->sgfn], cg->fgfs[Axz->sgfn],
cg->fgfs[Ayy->sgfn], cg->fgfs[Ayz->sgfn], cg->fgfs[Azz->sgfn]); cg->fgfs[Ayy->sgfn], cg->fgfs[Ayz->sgfn], cg->fgfs[Azz->sgfn]);
#elif (AGM == 1) #elif (AGM == 1)
if (iter_count == 3) if (iter_count == 3)
@@ -4542,25 +4545,16 @@ void bssn_class::Step(int lev, int YN)
sPp = sPp->next; sPp = sPp->next;
} }
} }
// check error information // Non-blocking error reduction overlapped with Sync to hide Allreduce latency
MPI_Request err_req_cor;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &err_req_cor);
}
if (ERROR)
{
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN on Shell Patches in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); Parallel::AsyncSyncState async_cor;
Parallel::Sync_start(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_cor[lev], async_cor);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -4573,11 +4567,30 @@ void bssn_class::Step(int lev, int YN)
{ {
prev_clock = curr_clock; prev_clock = curr_clock;
curr_clock = clock(); curr_clock = clock();
cout << " Shell stuff synchronization used " cout << " Shell stuff synchronization used "
<< (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC) << (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC)
<< " seconds! " << endl; << " seconds! " << endl;
} }
} }
#endif
Parallel::Sync_finish(sync_cache_cor[lev], async_cor, SynchList_cor, Symmetry);
#ifdef WithShell
// Complete non-blocking error reduction and check
MPI_Wait(&err_req_cor, MPI_STATUS_IGNORE);
if (ERROR)
{
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime
<< ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
#endif #endif
// for black hole position // for black hole position
if (BH_num > 0 && lev == GH->levels - 1) if (BH_num > 0 && lev == GH->levels - 1)
@@ -4943,11 +4956,19 @@ void bssn_class::Step(int lev, int YN)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Predictor rhs calculation"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Predictor rhs calculation");
// check error information // Non-blocking error reduction overlapped with Sync to hide Allreduce latency
MPI_Request err_req;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev]); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev], &err_req);
} }
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Predictor sync");
Parallel::Sync_cached(GH->PatL[lev], SynchList_pre, Symmetry, sync_cache_pre[lev]);
// Complete non-blocking error reduction and check
MPI_Wait(&err_req, MPI_STATUS_IGNORE);
if (ERROR) if (ERROR)
{ {
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
@@ -4959,10 +4980,6 @@ void bssn_class::Step(int lev, int YN)
} }
} }
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Predictor sync");
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry);
#if (MAPBH == 0) #if (MAPBH == 0)
// for black hole position // for black hole position
if (BH_num > 0 && lev == GH->levels - 1) if (BH_num > 0 && lev == GH->levels - 1)
@@ -5140,30 +5157,34 @@ void bssn_class::Step(int lev, int YN)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector error check"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector error check");
// check error information // Non-blocking error reduction overlapped with Sync to hide Allreduce latency
MPI_Request err_req_cor;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev]); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev], &err_req_cor);
} }
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector sync");
Parallel::Sync_cached(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_cor[lev]);
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Corrector sync");
// Complete non-blocking error reduction and check
MPI_Wait(&err_req_cor, MPI_STATUS_IGNORE);
if (ERROR) if (ERROR)
{ {
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
if (ErrorMonitor->outfile) if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in RK4 substep#" << iter_count ErrorMonitor->outfile << "find NaN in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime << " variables at t = " << PhysTime
<< ", lev = " << lev << endl; << ", lev = " << lev << endl;
MPI_Abort(MPI_COMM_WORLD, 1); MPI_Abort(MPI_COMM_WORLD, 1);
} }
} }
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector sync");
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry);
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Corrector sync");
#if (MAPBH == 0) #if (MAPBH == 0)
// for black hole position // for black hole position
if (BH_num > 0 && lev == GH->levels - 1) if (BH_num > 0 && lev == GH->levels - 1)
@@ -5447,21 +5468,11 @@ void bssn_class::SHStep()
#if (PSTR == 1 || PSTR == 2) #if (PSTR == 1 || PSTR == 2)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Predictor's error check"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Predictor's error check");
#endif #endif
// check error information // Non-blocking error reduction overlapped with Synch to hide Allreduce latency
MPI_Request err_req;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &err_req);
}
if (ERROR)
{
SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables on Shell Patches at t = " << PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
} }
{ {
@@ -5473,12 +5484,25 @@ void bssn_class::SHStep()
{ {
prev_clock = curr_clock; prev_clock = curr_clock;
curr_clock = clock(); curr_clock = clock();
cout << " Shell stuff synchronization used " cout << " Shell stuff synchronization used "
<< (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC) << (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC)
<< " seconds! " << endl; << " seconds! " << endl;
} }
} }
// Complete non-blocking error reduction and check
MPI_Wait(&err_req, MPI_STATUS_IGNORE);
if (ERROR)
{
SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN in state variables on Shell Patches at t = " << PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
// corrector // corrector
for (iter_count = 1; iter_count < 4; iter_count++) for (iter_count = 1; iter_count < 4; iter_count++)
{ {
@@ -5621,21 +5645,11 @@ void bssn_class::SHStep()
sPp = sPp->next; sPp = sPp->next;
} }
} }
// check error information // Non-blocking error reduction overlapped with Synch to hide Allreduce latency
MPI_Request err_req_cor;
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Iallreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, &err_req_cor);
}
if (ERROR)
{
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN on Shell Patches in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
} }
{ {
@@ -5647,12 +5661,26 @@ void bssn_class::SHStep()
{ {
prev_clock = curr_clock; prev_clock = curr_clock;
curr_clock = clock(); curr_clock = clock();
cout << " Shell stuff synchronization used " cout << " Shell stuff synchronization used "
<< (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC) << (double)(curr_clock - prev_clock) / ((double)CLOCKS_PER_SEC)
<< " seconds! " << endl; << " seconds! " << endl;
} }
} }
// Complete non-blocking error reduction and check
MPI_Wait(&err_req_cor, MPI_STATUS_IGNORE);
if (ERROR)
{
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0)
{
if (ErrorMonitor->outfile)
ErrorMonitor->outfile << "find NaN on Shell Patches in RK4 substep#" << iter_count
<< " variables at t = " << PhysTime << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
}
sPp = SH->PatL; sPp = SH->PatL;
while (sPp) while (sPp)
{ {
@@ -5781,7 +5809,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
// misc::tillherecheck(GH->Commlev[GH->mylev],GH->start_rank[GH->mylev],a_stream.str()); // misc::tillherecheck(GH->Commlev[GH->mylev],GH->start_rank[GH->mylev],a_stream.str());
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], SynchList_pre, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], SynchList_pre, Symmetry, sync_cache_rp_coarse[lev]);
#if (PSTR == 1 || PSTR == 2) #if (PSTR == 1 || PSTR == 2)
// a_stream.clear(); // a_stream.clear();
@@ -5791,21 +5819,11 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SL,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, GH->bdsul[lev], Symmetry);
@@ -5842,7 +5860,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
// misc::tillherecheck(GH->Commlev[GH->mylev],GH->start_rank[GH->mylev],a_stream.str()); // misc::tillherecheck(GH->Commlev[GH->mylev],GH->start_rank[GH->mylev],a_stream.str());
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], SL, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], SL, Symmetry, sync_cache_rp_coarse[lev]);
#if (PSTR == 1 || PSTR == 2) #if (PSTR == 1 || PSTR == 2)
// a_stream.clear(); // a_stream.clear();
@@ -5852,21 +5870,11 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SL, SL, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SL, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->bdsul[lev], Symmetry);
@@ -5880,7 +5888,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
} }
Parallel::Sync(GH->PatL[lev], SL, Symmetry); Parallel::Sync_cached(GH->PatL[lev], SL, Symmetry, sync_cache_rp_fine[lev]);
#if (PSTR == 1 || PSTR == 2) #if (PSTR == 1 || PSTR == 2)
// a_stream.clear(); // a_stream.clear();
@@ -5938,24 +5946,14 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry);
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], SynchList_pre, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], SynchList_pre, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SL,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, GH->bdsul[lev], Symmetry);
@@ -5970,31 +5968,21 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry);
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], SL, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], SL, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SL, SL, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SL, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->bdsul[lev], Symmetry);
#endif #endif
} }
Parallel::Sync(GH->PatL[lev], SL, Symmetry); Parallel::Sync_cached(GH->PatL[lev], SL, Symmetry, sync_cache_rp_fine[lev]);
} }
} }
@@ -6045,24 +6033,14 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, GH->rsul[lev], Symmetry);
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], SynchList_pre, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], SynchList_pre, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SynchList_cor,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, GH->bdsul[lev], Symmetry);
@@ -6079,31 +6057,21 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, GH->rsul[lev], Symmetry);
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], StateList, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], StateList, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],StateList,SynchList_cor,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],StateList,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, GH->bdsul[lev], Symmetry);
#endif #endif
} }
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); Parallel::Sync_cached(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_rp_fine[lev]);
} }
} }
@@ -6133,21 +6101,11 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
} }
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SynchList_cor,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, GH->bdsul[lev], Symmetry);
@@ -6156,21 +6114,11 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
else // no time refinement levels and for all same time levels else // no time refinement levels and for all same time levels
{ {
#if (RPB == 0) #if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif #endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],StateList,SynchList_cor,Symmetry); // Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],StateList,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, GH->bdsul[lev], Symmetry); Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, GH->bdsul[lev], Symmetry);
@@ -6186,10 +6134,10 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
#else #else
Parallel::Restrict_after(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, Symmetry); Parallel::Restrict_after(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, Symmetry);
#endif #endif
Parallel::Sync(GH->PatL[lev - 1], StateList, Symmetry); Parallel::Sync_cached(GH->PatL[lev - 1], StateList, Symmetry, sync_cache_rp_coarse[lev]);
} }
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); Parallel::Sync_cached(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_rp_fine[lev]);
} }
} }
#undef MIXOUTB #undef MIXOUTB

View File

@@ -126,6 +126,11 @@ public:
MyList<var> *OldStateList, *DumpList; MyList<var> *OldStateList, *DumpList;
MyList<var> *ConstraintList; MyList<var> *ConstraintList;
Parallel::SyncCache *sync_cache_pre; // per-level cache for predictor sync
Parallel::SyncCache *sync_cache_cor; // per-level cache for corrector sync
Parallel::SyncCache *sync_cache_rp_coarse; // RestrictProlong sync on PatL[lev-1]
Parallel::SyncCache *sync_cache_rp_fine; // RestrictProlong sync on PatL[lev]
monitor *ErrorMonitor, *Psi4Monitor, *BHMonitor, *MAPMonitor; monitor *ErrorMonitor, *Psi4Monitor, *BHMonitor, *MAPMonitor;
monitor *ConVMonitor; monitor *ConVMonitor;
surface_integral *Waveshell; surface_integral *Waveshell;

View File

@@ -161,8 +161,36 @@
chi_rhs = F2o3 *chin1*( alpn1 * trK - div_beta ) !rhs for chi chi_rhs = F2o3 *chin1*( alpn1 * trK - div_beta ) !rhs for chi
call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev)
call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev)
call fderivs(ex,dyy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev)
call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
gxx_rhs = - TWO * alpn1 * Axx - F2o3 * gxx * div_beta + &
TWO *( gxx * betaxx + gxy * betayx + gxz * betazx)
gyy_rhs = - TWO * alpn1 * Ayy - F2o3 * gyy * div_beta + &
TWO *( gxy * betaxy + gyy * betayy + gyz * betazy)
gzz_rhs = - TWO * alpn1 * Azz - F2o3 * gzz * div_beta + &
TWO *( gxz * betaxz + gyz * betayz + gzz * betazz)
gxy_rhs = - TWO * alpn1 * Axy + F1o3 * gxy * div_beta + &
gxx * betaxy + gxz * betazy + &
gyy * betayx + gyz * betazx &
- gxy * betazz
gyz_rhs = - TWO * alpn1 * Ayz + F1o3 * gyz * div_beta + &
gxy * betaxz + gyy * betayz + &
gxz * betaxy + gzz * betazy &
- gyz * betaxx
gxz_rhs = - TWO * alpn1 * Axz + F1o3 * gxz * div_beta + &
gxx * betaxz + gxy * betayz + &
gyz * betayx + gzz * betazx &
- gxz * betayy !rhs for gij
! invert tilted metric ! invert tilted metric
gupzz = gxx * gyy * gzz + gxy * gyz * gxz + gxz * gxy * gyz - & gupzz = gxx * gyy * gzz + gxy * gyz * gxz + gxz * gxy * gyz - &
@@ -173,12 +201,7 @@
gupyy = ( gxx * gzz - gxz * gxz ) / gupzz gupyy = ( gxx * gzz - gxz * gxz ) / gupzz
gupyz = - ( gxx * gyz - gxy * gxz ) / gupzz gupyz = - ( gxx * gyz - gxy * gxz ) / gupzz
gupzz = ( gxx * gyy - gxy * gxy ) / gupzz gupzz = ( gxx * gyy - gxy * gxy ) / gupzz
call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev)
call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev)
call fderivs(ex,dyy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev)
call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
if(co == 0)then if(co == 0)then
! Gam^i_Res = Gam^i + gup^ij_,j ! Gam^i_Res = Gam^i + gup^ij_,j
Gmx_Res = Gamx - (gupxx*(gupxx*gxxx+gupxy*gxyx+gupxz*gxzx)& Gmx_Res = Gamx - (gupxx*(gupxx*gxxx+gupxy*gxyx+gupxz*gxzx)&
@@ -922,103 +945,60 @@
SSA(2)=SYM SSA(2)=SYM
SSA(3)=ANTI SSA(3)=ANTI
!!!!!!!!!advection term part !!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency)
! lopsided_kodis shares the symmetry_bd buffer between advection and
! dissipation, eliminating redundant full-grid copies. For metric variables
! gxx/gyy/gzz (=dxx/dyy/dzz+1): kodis stencil coefficients sum to zero,
! so the constant offset has no effect on dissipation.
gxx_rhs = - TWO * alpn1 * Axx - F2o3 * gxx * div_beta + & call lopsided_kodis(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
TWO *( gxx * betaxx + gxy * betayx + gxz * betazx) call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided_kodis(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided_kodis(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
gyy_rhs = - TWO * alpn1 * Ayy - F2o3 * gyy * div_beta + & call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
TWO *( gxy * betaxy + gyy * betayy + gyz * betazy) call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided_kodis(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided_kodis(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided_kodis(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
gzz_rhs = - TWO * alpn1 * Azz - F2o3 * gzz * div_beta + & call lopsided_kodis(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,eps)
TWO *( gxz * betaxz + gyz * betayz + gzz * betazz) call lopsided_kodis(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS,eps)
gxy_rhs = - TWO * alpn1 * Axy + F1o3 * gxy * div_beta + & call lopsided_kodis(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
gxx * betaxy + gxz * betazy + & call lopsided_kodis(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
gyy * betayx + gyz * betazx & call lopsided_kodis(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
- gxy * betazz
gyz_rhs = - TWO * alpn1 * Ayz + F1o3 * gyz * div_beta + & #if 1
gxy * betaxz + gyy * betayz + & !! bam does not apply dissipation on gauge variables
gxz * betaxy + gzz * betazy & call lopsided_kodis(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS,eps)
- gyz * betaxx #if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided_kodis(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS,eps)
gxz_rhs = - TWO * alpn1 * Axz + F1o3 * gxz * div_beta + & call lopsided_kodis(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS,eps)
gxx * betaxz + gxy * betayz + & call lopsided_kodis(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
gyz * betayx + gzz * betazx & #endif
- gxz * betayy !rhs for gij #if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided_kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided_kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided_kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
#endif
#else
if(eps>0)then ! No dissipation on gauge variables (advection only)
! usual Kreiss-Oliger dissipation
call merge_lopsided_kodis(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call merge_lopsided_kodis(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call merge_lopsided_kodis(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
call merge_lopsided_kodis(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
call merge_lopsided_kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call merge_lopsided_kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
else
call lopsided(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS)
call lopsided(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA)
call lopsided(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA)
call lopsided(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS)
call lopsided(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA)
call lopsided(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA)
call lopsided(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA)
call lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS)
#if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA)
#endif
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA)
#endif
#endif
endif
if(co == 0)then if(co == 0)then
! ham_Res = trR + 2/3 * K^2 - A_ij * A^ij - 16 * PI * rho ! ham_Res = trR + 2/3 * K^2 - A_ij * A^ij - 16 * PI * rho
@@ -1163,265 +1143,3 @@ endif
return return
end function compute_rhs_bssn end function compute_rhs_bssn
subroutine merge_lopsided_kodis(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA,eps)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
!~~~~~~> local variables:
! note index -2,-1,0, so we have 3 extra points
real*8,dimension(-2:ex(1),-2:ex(2),-2:ex(3)) :: fh
integer :: imin_lopsided,jmin_lopsided,kmin_lopsided,imin_kodis,jmin_kodis,kmin_kodis,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F3=3.d0
real*8, parameter :: TWO=2.d0,F6=6.0d0,F18=1.8d1
real*8, parameter :: F12=1.2d1, F10=1.d1,EIT=8.d0
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8, parameter :: SIX=6.d0,FIT=1.5d1,TWT=2.d1
real*8,parameter::cof=6.4d1 ! 2^6
real*8,intent(in) :: eps
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin_lopsided = 1
jmin_lopsided = 1
kmin_lopsided = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin_lopsided = -2
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin_lopsided = -2
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin_lopsided = -2
imin_kodis = 1
jmin_kodis = 1
kmin_kodis = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin_kodis = -2
if(Symmetry == OCTANT .and. dabs(X(1)) < dX) imin_kodis = -2
if(Symmetry == OCTANT .and. dabs(Y(1)) < dY) jmin_kodis = -2
call symmetry_bd(3,ex,f,fh,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
!! new code, 2012dec27, based on bam
! x direction
if(Sfx(i,j,k) > ZEO)then
if(i+3 <= imax)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
elseif(i+2 <= imax)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax)then
! v
! D f = ------[ 3f + 10f - 18f + 6f - f ]
! i 12dx i+v i i-v i-2v i-3v
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
! set imax and imin_lopsided 0
endif
elseif(Sfx(i,j,k) < ZEO)then
if(i-3 >= imin_lopsided)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
elseif(i-2 >= imin_lopsided)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i-1 >= imin_lopsided)then
! v
! D f = ------[ 3f + 10f - 18f + 6f - f ]
! i 12dx i+v i i-v i-2v i-3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
! set imax and imin_lopsided 0
endif
endif
! y direction
if(Sfy(i,j,k) > ZEO)then
if(j+3 <= jmax)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
elseif(j+2 <= jmax)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax)then
! v
! D f = ------[ 3f + 10f - 18f + 6f - f ]
! i 12dx i+v i i-v i-2v i-3v
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
! set imax and imin_lopsided 0
endif
elseif(Sfy(i,j,k) < ZEO)then
if(j-3 >= jmin_lopsided)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
elseif(j-2 >= jmin_lopsided)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j-1 >= jmin_lopsided)then
! v
! D f = ------[ 3f + 10f - 18f + 6f - f ]
! i 12dx i+v i i-v i-2v i-3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
! set jmax and jmin_lopsided 0
endif
endif
! z direction
if(Sfz(i,j,k) > ZEO)then
if(k+3 <= kmax)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
elseif(k+2 <= kmax)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax)then
! v
! D f = ------[ 3f + 10f - 18f + 6f - f ]
! i 12dx i+v i i-v i-2v i-3v
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
! set imax and imin_lopsided 0
endif
elseif(Sfz(i,j,k) < ZEO)then
if(k-3 >= kmin_lopsided)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
elseif(k-2 >= kmin_lopsided)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k-1 >= kmin_lopsided)then
! v
! D f = ------[ 3f + 10f - 18f + 6f - f ]
! i 12dx i+v i i-v i-2v i-3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
! set kmax and kmin_lopsided 0
endif
endif
if(i-3 >= imin_kodis .and. i+3 <= imax .and. &
j-3 >= jmin_kodis .and. j+3 <= jmax .and. &
k-3 >= kmin_kodis .and. k+3 <= kmax) then
! calculation order if important ?
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )/dX + &
( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )/dY + &
( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )/dZ )
endif
enddo
enddo
enddo
return
end subroutine merge_lopsided_kodis

File diff suppressed because it is too large Load Diff

View File

@@ -1,92 +1,107 @@
#ifndef CGH_H #ifndef CGH_H
#define CGH_H #define CGH_H
#include <mpi.h> #include <mpi.h>
#include "MyList.h" #include "MyList.h"
#include "MPatch.h" #include "MPatch.h"
#include "macrodef.h" #include "macrodef.h"
#include "monitor.h" #include "monitor.h"
#include "Parallel.h" #include "Parallel.h"
class cgh class cgh
{ {
public: public:
int levels, movls, BH_num_in; int levels, movls, BH_num_in;
// information of boxes // information of boxes
int *grids; int *grids;
double ***bbox; double ***bbox;
int ***shape; int ***shape;
double ***handle; double ***handle;
double ***Porgls; double ***Porgls;
double *Lt; double *Lt;
// information of Patch list // information of Patch list
MyList<Patch> **PatL; MyList<Patch> **PatL;
// information of OutBdLow2Hi point list and Restrict point list // information of OutBdLow2Hi point list and Restrict point list
#if (RPB == 1) #if (RPB == 1)
MyList<Parallel::pointstru_bam> **bdsul, **rsul; MyList<Parallel::pointstru_bam> **bdsul, **rsul;
#endif #endif
#if (PSTR == 1 || PSTR == 2 || PSTR == 3) #if (PSTR == 1 || PSTR == 2 || PSTR == 3)
int mylev; int mylev;
int *start_rank, *end_rank; int *start_rank, *end_rank;
MPI_Comm *Commlev; MPI_Comm *Commlev;
#endif #endif
protected: protected:
int ingfs, fngfs; int ingfs, fngfs;
static constexpr double ratio = 0.75; static constexpr double ratio = 0.75;
int trfls; int trfls;
public: public:
cgh(int ingfsi, int fngfsi, int Symmetry, char *filename, int checkrun, monitor *ErrorMonitor); cgh(int ingfsi, int fngfsi, int Symmetry, char *filename, int checkrun, monitor *ErrorMonitor);
~cgh(); ~cgh();
void compose_cgh(int nprocs); void compose_cgh(int nprocs);
void sethandle(monitor *ErrorMonitor); void sethandle(monitor *ErrorMonitor);
void checkPatchList(MyList<Patch> *PatL, bool buflog); void checkPatchList(MyList<Patch> *PatL, bool buflog);
void Regrid(int Symmetry, int BH_num, double **Porgbr, double **Porg0, void Regrid(int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB, MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor); monitor *ErrorMonitor);
void Regrid_fake(int Symmetry, int BH_num, double **Porgbr, double **Porg0, void Regrid_fake(int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB, MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor); monitor *ErrorMonitor);
void recompose_cgh(int nprocs, bool *lev_flag, void recompose_cgh(int nprocs, bool *lev_flag,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB); int Symmetry, bool BB);
void recompose_cgh_fake(int nprocs, bool *lev_flag, void recompose_cgh_fake(int nprocs, bool *lev_flag,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB); int Symmetry, bool BB);
void read_bbox(int Symmetry, char *filename); void read_bbox(int Symmetry, char *filename);
MyList<Patch> *construct_patchlist(int lev, int Symmetry); MyList<Patch> *construct_patchlist(int lev, int Symmetry);
bool Interp_One_Point(MyList<var> *VarList, bool Interp_One_Point(MyList<var> *VarList,
double *XX, /*input global Cartesian coordinate*/ double *XX, /*input global Cartesian coordinate*/
double *Shellf, int Symmetry); double *Shellf, int Symmetry);
void recompose_cgh_Onelevel(int nprocs, int lev, void recompose_cgh_Onelevel(int nprocs, int lev,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB); int Symmetry, bool BB);
void Regrid_Onelevel(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0, void Regrid_Onelevel(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB, MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor); monitor *ErrorMonitor);
void Regrid_Onelevel_aux(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0, void Regrid_Onelevel_aux(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB, MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor); monitor *ErrorMonitor);
void settrfls(const int lev); void settrfls(const int lev);
#if (PSTR == 1 || PSTR == 2 || PSTR == 3) #if (PSTR == 1 || PSTR == 2 || PSTR == 3)
void construct_mylev(int nprocs); void construct_mylev(int nprocs);
#endif #endif
};
// Load balancing support
#endif /* CGH_H */ bool enable_load_balance; // Enable load balancing
int load_balance_check_interval; // Check interval (in time steps)
int current_time_step; // Current time step counter
double *rank_interp_times; // Store interpolation times for each rank
int *heavy_ranks; // Store heavy rank numbers
int num_heavy_ranks; // Number of heavy ranks
void init_load_balance(int nprocs);
void update_interp_time(int rank, double time);
bool check_and_rebalance(int nprocs, int lev,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
};
#endif /* CGH_H */

View File

@@ -69,10 +69,12 @@
fy = ZEO fy = ZEO
fz = ZEO fz = ZEO
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
!DIR$ UNROLL PARTIAL(4)
do k=1,ex(3)-1 do k=1,ex(3)-1
do j=1,ex(2)-1 do j=1,ex(2)-1
do i=1,ex(1)-1 do i=1,ex(1)-1
! x direction ! x direction
if(i+1 <= imax .and. i-1 >= imin)then if(i+1 <= imax .and. i-1 >= imin)then
! !
! - f(i-1) + f(i+1) ! - f(i-1) + f(i+1)
@@ -371,6 +373,8 @@
fxz = ZEO fxz = ZEO
fyz = ZEO fyz = ZEO
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
!DIR$ UNROLL PARTIAL(4)
do k=1,ex(3)-1 do k=1,ex(3)-1
do j=1,ex(2)-1 do j=1,ex(2)-1
do i=1,ex(1)-1 do i=1,ex(1)-1
@@ -1000,7 +1004,86 @@
do k=1,ex(3)-1 do k=1,ex(3)-1
do j=1,ex(2)-1 do j=1,ex(2)-1
do i=1,ex(1)-1 do i=1,ex(1)-1
#if 0
! x direction
if(i+2 <= imax .and. i-2 >= imin)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
fx(i,j,k)=d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax .and. i-1 >= imin)then
!
! - f(i-1) + f(i+1)
! fx(i) = --------------------------------
! 2 dx
fx(i,j,k)=d2dx*(-fh(i-1,j,k)+fh(i+1,j,k))
! set imax and imin 0
endif
! y direction
if(j+2 <= jmax .and. j-2 >= jmin)then
fy(i,j,k)=d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax .and. j-1 >= jmin)then
fy(i,j,k)=d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
! set jmax and jmin 0
endif
! z direction
if(k+2 <= kmax .and. k-2 >= kmin)then
fz(i,j,k)=d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax .and. k-1 >= kmin)then
fz(i,j,k)=d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
! set kmax and kmin 0
endif
#elif 0
! x direction
if(i+2 <= imax .and. i-2 >= imin)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
fx(i,j,k)=d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+3 <= imax .and. i-1 >= imin)then
fx(i,j,k)=d12dx*(-3.d0*fh(i-1,j,k)-1.d1*fh(i,j,k)+1.8d1*fh(i+1,j,k)-6.d0*fh(i+2,j,k)+fh(i+3,j,k))
elseif(i+1 <= imax .and. i-3 >= imin)then
fx(i,j,k)=d12dx*( 3.d0*fh(i+1,j,k)+1.d1*fh(i,j,k)-1.8d1*fh(i-1,j,k)+6.d0*fh(i-2,j,k)-fh(i-3,j,k))
! set imax and imin 0
endif
! y direction
if(j+2 <= jmax .and. j-2 >= jmin)then
fy(i,j,k)=d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+3 <= jmax .and. j-1 >= jmin)then
fy(i,j,k)=d12dy*(-3.d0*fh(i,j-1,k)-1.d1*fh(i,j,k)+1.8d1*fh(i,j+1,k)-6.d0*fh(i,j+2,k)+fh(i,j+3,k))
elseif(j+1 <= jmax .and. j-3 >= jmin)then
fy(i,j,k)=d12dy*( 3.d0*fh(i,j+1,k)+1.d1*fh(i,j,k)-1.8d1*fh(i,j-1,k)+6.d0*fh(i,j-2,k)-fh(i,j-3,k))
! set jmax and jmin 0
endif
! z direction
if(k+2 <= kmax .and. k-2 >= kmin)then
fz(i,j,k)=d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+3 <= kmax .and. k-1 >= kmin)then
fz(i,j,k)=d12dz*(-3.d0*fh(i,j,k-1)-1.d1*fh(i,j,k)+1.8d1*fh(i,j,k+1)-6.d0*fh(i,j,k+2)+fh(i,j,k+3))
elseif(k+1 <= kmax .and. k-3 >= kmin)then
fz(i,j,k)=d12dz*( 3.d0*fh(i,j,k+1)+1.d1*fh(i,j,k)-1.8d1*fh(i,j,k-1)+6.d0*fh(i,j,k-2)-fh(i,j,k-3))
! set kmax and kmin 0
endif
#else
! for bam comparison ! for bam comparison
if(i+2 <= imax .and. i-2 >= imin .and. & if(i+2 <= imax .and. i-2 >= imin .and. &
j+2 <= jmax .and. j-2 >= jmin .and. & j+2 <= jmax .and. j-2 >= jmin .and. &
@@ -1015,7 +1098,7 @@
fy(i,j,k)=d2dy*(-fh(i,j-1,k)+fh(i,j+1,k)) fy(i,j,k)=d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
fz(i,j,k)=d2dz*(-fh(i,j,k-1)+fh(i,j,k+1)) fz(i,j,k)=d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
endif endif
#endif
enddo enddo
enddo enddo
enddo enddo
@@ -1325,7 +1408,85 @@
do k=1,ex(3)-1 do k=1,ex(3)-1
do j=1,ex(2)-1 do j=1,ex(2)-1
do i=1,ex(1)-1 do i=1,ex(1)-1
#if 0
!~~~~~~ fxx
if(i+2 <= imax .and. i-2 >= imin)then
!
! - f(i-2) + 16 f(i-1) - 30 f(i) + 16 f(i+1) - f(i+2)
! fxx(i) = ----------------------------------------------------------
! 12 dx^2
fxx(i,j,k) = Fdxdx*(-fh(i-2,j,k)+F16*fh(i-1,j,k)-F30*fh(i,j,k) &
-fh(i+2,j,k)+F16*fh(i+1,j,k) )
elseif(i+1 <= imax .and. i-1 >= imin)then
!
! f(i-1) - 2 f(i) + f(i+1)
! fxx(i) = --------------------------------
! dx^2
fxx(i,j,k) = Sdxdx*(fh(i-1,j,k)-TWO*fh(i,j,k) &
+fh(i+1,j,k) )
endif
!~~~~~~ fyy
if(j+2 <= jmax .and. j-2 >= jmin)then
fyy(i,j,k) = Fdydy*(-fh(i,j-2,k)+F16*fh(i,j-1,k)-F30*fh(i,j,k) &
-fh(i,j+2,k)+F16*fh(i,j+1,k) )
elseif(j+1 <= jmax .and. j-1 >= jmin)then
fyy(i,j,k) = Sdydy*(fh(i,j-1,k)-TWO*fh(i,j,k) &
+fh(i,j+1,k) )
endif
!~~~~~~ fzz
if(k+2 <= kmax .and. k-2 >= kmin)then
fzz(i,j,k) = Fdzdz*(-fh(i,j,k-2)+F16*fh(i,j,k-1)-F30*fh(i,j,k) &
-fh(i,j,k+2)+F16*fh(i,j,k+1) )
elseif(k+1 <= kmax .and. k-1 >= kmin)then
fzz(i,j,k) = Sdzdz*(fh(i,j,k-1)-TWO*fh(i,j,k) &
+fh(i,j,k+1) )
endif
!~~~~~~ fxy
if(i+2 <= imax .and. i-2 >= imin .and. j+2 <= jmax .and. j-2 >= jmin)then
!
! ( f(i-2,j-2) - 8 f(i-1,j-2) + 8 f(i+1,j-2) - f(i+2,j-2) )
! - 8 ( f(i-2,j-1) - 8 f(i-1,j-1) + 8 f(i+1,j-1) - f(i+2,j-1) )
! + 8 ( f(i-2,j+1) - 8 f(i-1,j+1) + 8 f(i+1,j+1) - f(i+2,j+1) )
! - ( f(i-2,j+2) - 8 f(i-1,j+2) + 8 f(i+1,j+2) - f(i+2,j+2) )
! fxy(i,j) = ----------------------------------------------------------------
! 144 dx dy
fxy(i,j,k) = Fdxdy*( (fh(i-2,j-2,k)-F8*fh(i-1,j-2,k)+F8*fh(i+1,j-2,k)-fh(i+2,j-2,k)) &
-F8 *(fh(i-2,j-1,k)-F8*fh(i-1,j-1,k)+F8*fh(i+1,j-1,k)-fh(i+2,j-1,k)) &
+F8 *(fh(i-2,j+1,k)-F8*fh(i-1,j+1,k)+F8*fh(i+1,j+1,k)-fh(i+2,j+1,k)) &
- (fh(i-2,j+2,k)-F8*fh(i-1,j+2,k)+F8*fh(i+1,j+2,k)-fh(i+2,j+2,k)))
elseif(i+1 <= imax .and. i-1 >= imin .and. j+1 <= jmax .and. j-1 >= jmin)then
! f(i-1,j-1) - f(i+1,j-1) - f(i-1,j+1) + f(i+1,j+1)
! fxy(i,j) = -----------------------------------------------------------
! 4 dx dy
fxy(i,j,k) = Sdxdy*(fh(i-1,j-1,k)-fh(i+1,j-1,k)-fh(i-1,j+1,k)+fh(i+1,j+1,k))
endif
!~~~~~~ fxz
if(i+2 <= imax .and. i-2 >= imin .and. k+2 <= kmax .and. k-2 >= kmin)then
fxz(i,j,k) = Fdxdz*( (fh(i-2,j,k-2)-F8*fh(i-1,j,k-2)+F8*fh(i+1,j,k-2)-fh(i+2,j,k-2)) &
-F8 *(fh(i-2,j,k-1)-F8*fh(i-1,j,k-1)+F8*fh(i+1,j,k-1)-fh(i+2,j,k-1)) &
+F8 *(fh(i-2,j,k+1)-F8*fh(i-1,j,k+1)+F8*fh(i+1,j,k+1)-fh(i+2,j,k+1)) &
- (fh(i-2,j,k+2)-F8*fh(i-1,j,k+2)+F8*fh(i+1,j,k+2)-fh(i+2,j,k+2)))
elseif(i+1 <= imax .and. i-1 >= imin .and. k+1 <= kmax .and. k-1 >= kmin)then
fxz(i,j,k) = Sdxdz*(fh(i-1,j,k-1)-fh(i+1,j,k-1)-fh(i-1,j,k+1)+fh(i+1,j,k+1))
endif
!~~~~~~ fyz
if(j+2 <= jmax .and. j-2 >= jmin .and. k+2 <= kmax .and. k-2 >= kmin)then
fyz(i,j,k) = Fdydz*( (fh(i,j-2,k-2)-F8*fh(i,j-1,k-2)+F8*fh(i,j+1,k-2)-fh(i,j+2,k-2)) &
-F8 *(fh(i,j-2,k-1)-F8*fh(i,j-1,k-1)+F8*fh(i,j+1,k-1)-fh(i,j+2,k-1)) &
+F8 *(fh(i,j-2,k+1)-F8*fh(i,j-1,k+1)+F8*fh(i,j+1,k+1)-fh(i,j+2,k+1)) &
- (fh(i,j-2,k+2)-F8*fh(i,j-1,k+2)+F8*fh(i,j+1,k+2)-fh(i,j+2,k+2)))
elseif(j+1 <= jmax .and. j-1 >= jmin .and. k+1 <= kmax .and. k-1 >= kmin)then
fyz(i,j,k) = Sdydz*(fh(i,j-1,k-1)-fh(i,j+1,k-1)-fh(i,j-1,k+1)+fh(i,j+1,k+1))
endif
#else
! for bam comparison ! for bam comparison
if(i+2 <= imax .and. i-2 >= imin .and. & if(i+2 <= imax .and. i-2 >= imin .and. &
j+2 <= jmax .and. j-2 >= jmin .and. & j+2 <= jmax .and. j-2 >= jmin .and. &
@@ -1361,7 +1522,7 @@
fxz(i,j,k) = Sdxdz*(fh(i-1,j,k-1)-fh(i+1,j,k-1)-fh(i-1,j,k+1)+fh(i+1,j,k+1)) fxz(i,j,k) = Sdxdz*(fh(i-1,j,k-1)-fh(i+1,j,k-1)-fh(i-1,j,k+1)+fh(i+1,j,k+1))
fyz(i,j,k) = Sdydz*(fh(i,j-1,k-1)-fh(i,j+1,k-1)-fh(i,j-1,k+1)+fh(i,j+1,k+1)) fyz(i,j,k) = Sdydz*(fh(i,j-1,k-1)-fh(i,j+1,k-1)-fh(i,j-1,k+1)+fh(i,j+1,k+1))
endif endif
#endif
enddo enddo
enddo enddo
enddo enddo

View File

@@ -326,8 +326,7 @@ subroutine symmetry_bd(ord,extc,func,funcc,SoA)
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1)
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1)
enddo enddo
do i=0,ord-1 do i=0,ord-1
funcc(:,-i,1:extc(3)) = funcc(:,i+2,1:extc(3))*SoA(2) funcc(:,-i,1:extc(3)) = funcc(:,i+2,1:extc(3))*SoA(2)
@@ -884,13 +883,17 @@ subroutine symmetry_bd(ord,extc,func,funcc,SoA)
integer::i integer::i
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1)
enddo enddo
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
do i=0,ord-1 do i=0,ord-1
funcc(:,-i,1:extc(3)) = funcc(:,i+1,1:extc(3))*SoA(2) funcc(:,-i,1:extc(3)) = funcc(:,i+1,1:extc(3))*SoA(2)
enddo enddo
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
do i=0,ord-1 do i=0,ord-1
funcc(:,:,-i) = funcc(:,:,i+1)*SoA(3) funcc(:,:,-i) = funcc(:,:,i+1)*SoA(3)
enddo enddo
@@ -1113,6 +1116,7 @@ end subroutine d2dump
! Lagrangian polynomial interpolation ! Lagrangian polynomial interpolation
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
!DIR$ ATTRIBUTES FORCEINLINE :: polint
subroutine polint(xa, ya, x, y, dy, ordn) subroutine polint(xa, ya, x, y, dy, ordn)
implicit none implicit none

View File

@@ -6,6 +6,103 @@
! Vertex or Cell is distinguished in routine symmetry_bd which locates in ! Vertex or Cell is distinguished in routine symmetry_bd which locates in
! file "fmisc.f90" ! file "fmisc.f90"
#if (ghost_width == 2)
! second order code
!------------------------------------------------------------------------------------------------------------------------------
!usual type Kreiss-Oliger type numerical dissipation
!We support cell center only
! (D_+D_-)^2 =
! f(i-2) - 4 f(i-1) + 6 f(i) - 4 f(i+1) + f(i+2)
! ------------------------------------------------------
! dx^4
!------------------------------------------------------------------------------------------------------------------------------
! do not add dissipation near boundary
subroutine kodis(ex,X,Y,Z,f,f_rhs,SoA,Symmetry,eps)
implicit none
! argument variables
integer,intent(in) :: Symmetry
integer,dimension(3),intent(in)::ex
real*8, dimension(1:3), intent(in) :: SoA
double precision,intent(in),dimension(ex(1))::X
double precision,intent(in),dimension(ex(2))::Y
double precision,intent(in),dimension(ex(3))::Z
double precision,intent(in),dimension(ex(1),ex(2),ex(3))::f
double precision,intent(inout),dimension(ex(1),ex(2),ex(3))::f_rhs
real*8,intent(in) :: eps
!~~~~~~ other variables
real*8 :: dX,dY,dZ
real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8,parameter :: cof = 1.6d1 ! 2^4
real*8, parameter :: F4=4.d0,F6=6.d0
integer::i,j,k
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
call symmetry_bd(2,ex,f,fh,SoA)
! f(i-2) - 4 f(i-1) + 6 f(i) - 4 f(i+1) + f(i+2)
! ------------------------------------------------------
! dx^4
! note the sign (-1)^r-1, now r=2
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
!DIR$ UNROLL PARTIAL(4)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i-2 >= imin .and. i+2 <= imax .and. &
j-2 >= jmin .and. j+2 <= jmax .and. &
k-2 >= kmin .and. k+2 <= kmax) then
! x direction
f_rhs(i,j,k) = f_rhs(i,j,k) - eps/dX/cof * ( &
(fh(i-2,j,k)+fh(i+2,j,k)) &
- F4 * (fh(i-1,j,k)+fh(i+1,j,k)) &
+ F6 * fh(i,j,k) )
! y direction
f_rhs(i,j,k) = f_rhs(i,j,k) - eps/dY/cof * ( &
(fh(i,j-2,k)+fh(i,j+2,k)) &
- F4 * (fh(i,j-1,k)+fh(i,j+1,k)) &
+ F6 * fh(i,j,k) )
! z direction
f_rhs(i,j,k) = f_rhs(i,j,k) - eps/dZ/cof * ( &
(fh(i,j,k-2)+fh(i,j,k+2)) &
- F4 * (fh(i,j,k-1)+fh(i,j,k+1)) &
+ F6 * fh(i,j,k) )
endif
enddo
enddo
enddo
return
end subroutine kodis
#elif (ghost_width == 3)
! fourth order code ! fourth order code
!--------------------------------------------------------------------------------------------- !---------------------------------------------------------------------------------------------
@@ -61,7 +158,7 @@ integer, parameter :: NO_SYMM=0, OCTANT=2
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2 if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2
if(Symmetry == OCTANT .and. dabs(X(1)) < dX) imin = -2 if(Symmetry == OCTANT .and. dabs(X(1)) < dX) imin = -2
if(Symmetry == OCTANT .and. dabs(Y(1)) < dY) jmin = -2 if(Symmetry == OCTANT .and. dabs(Y(1)) < dY) jmin = -2
!print*,'imin,jmin,kmin=',imin,jmin,kmin
call symmetry_bd(3,ex,f,fh,SoA) call symmetry_bd(3,ex,f,fh,SoA)
do k=1,ex(3) do k=1,ex(3)
@@ -71,7 +168,28 @@ integer, parameter :: NO_SYMM=0, OCTANT=2
if(i-3 >= imin .and. i+3 <= imax .and. & if(i-3 >= imin .and. i+3 <= imax .and. &
j-3 >= jmin .and. j+3 <= jmax .and. & j-3 >= jmin .and. j+3 <= jmax .and. &
k-3 >= kmin .and. k+3 <= kmax) then k-3 >= kmin .and. k+3 <= kmax) then
#if 0
! x direction
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dX/cof * ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )
! y direction
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dY/cof * ( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )
! z direction
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dZ/cof * ( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )
#else
! calculation order if important ? ! calculation order if important ?
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( & f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - & (fh(i-3,j,k)+fh(i+3,j,k)) - &
@@ -88,7 +206,7 @@ integer, parameter :: NO_SYMM=0, OCTANT=2
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + & SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - & FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )/dZ ) TWT* fh(i,j,k) )/dZ )
#endif
endif endif
enddo enddo
@@ -99,6 +217,218 @@ integer, parameter :: NO_SYMM=0, OCTANT=2
end subroutine kodis end subroutine kodis
#elif (ghost_width == 4)
! sixth order code
!------------------------------------------------------------------------------------------------------------------------------
!usual type Kreiss-Oliger type numerical dissipation
!We support cell center only
! (D_+D_-)^4 =
! f(i-4) - 8 f(i-3) + 28 f(i-2) - 56 f(i-1) + 70 f(i) - 56 f(i+1) + 28 f(i+2) - 8 f(i+3) + f(i+4)
! ----------------------------------------------------------------------------------------------------------
! dx^8
!------------------------------------------------------------------------------------------------------------------------------
! do not add dissipation near boundary
subroutine kodis(ex,X,Y,Z,f,f_rhs,SoA,Symmetry,eps)
implicit none
! argument variables
integer,intent(in) :: Symmetry
integer,dimension(3),intent(in)::ex
real*8, dimension(1:3), intent(in) :: SoA
double precision,intent(in),dimension(ex(1))::X
double precision,intent(in),dimension(ex(2))::Y
double precision,intent(in),dimension(ex(3))::Z
double precision,intent(in),dimension(ex(1),ex(2),ex(3))::f
double precision,intent(inout),dimension(ex(1),ex(2),ex(3))::f_rhs
real*8,intent(in) :: eps
!~~~~~~ other variables
real*8 :: dX,dY,dZ
real*8,dimension(-3:ex(1),-3:ex(2),-3:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8,parameter :: cof = 2.56d2 ! 2^8
real*8, parameter :: F8=8.d0,F28=2.8d1,F56=5.6d1,F70=7.d1
integer::i,j,k
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -3
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -3
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -3
call symmetry_bd(4,ex,f,fh,SoA)
! f(i-4) - 8 f(i-3) + 28 f(i-2) - 56 f(i-1) + 70 f(i) - 56 f(i+1) + 28 f(i+2) - 8 f(i+3) + f(i+4)
! ----------------------------------------------------------------------------------------------------------
! dx^8
! note the sign (-1)^r-1, now r=4
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i>imin+3 .and. i < imax-3 .and. &
j>jmin+3 .and. j < jmax-3 .and. &
k>kmin+3 .and. k < kmax-3) then
! x direction
f_rhs(i,j,k) = f_rhs(i,j,k) - eps/dX/cof * ( &
(fh(i-4,j,k)+fh(i+4,j,k)) &
- F8 * (fh(i-3,j,k)+fh(i+3,j,k)) &
+F28 * (fh(i-2,j,k)+fh(i+2,j,k)) &
-F56 * (fh(i-1,j,k)+fh(i+1,j,k)) &
+F70 * fh(i,j,k) )
! y direction
f_rhs(i,j,k) = f_rhs(i,j,k) - eps/dY/cof * ( &
(fh(i,j-4,k)+fh(i,j+4,k)) &
- F8 * (fh(i,j-3,k)+fh(i,j+3,k)) &
+F28 * (fh(i,j-2,k)+fh(i,j+2,k)) &
-F56 * (fh(i,j-1,k)+fh(i,j+1,k)) &
+F70 * fh(i,j,k) )
! z direction
f_rhs(i,j,k) = f_rhs(i,j,k) - eps/dZ/cof * ( &
(fh(i,j,k-4)+fh(i,j,k+4)) &
- F8 * (fh(i,j,k-3)+fh(i,j,k+3)) &
+F28 * (fh(i,j,k-2)+fh(i,j,k+2)) &
-F56 * (fh(i,j,k-1)+fh(i,j,k+1)) &
+F70 * fh(i,j,k) )
endif
enddo
enddo
enddo
return
end subroutine kodis
#elif (ghost_width == 5)
! eighth order code
!------------------------------------------------------------------------------------------------------------------------------
!usual type Kreiss-Oliger type numerical dissipation
!We support cell center only
! Note the notation D_+ and D_- [P240 of B. Gustafsson, H.-O. Kreiss, and J. Oliger, Time
! Dependent Problems and Difference Methods (Wiley, New York, 1995).]
! D_+ = (f(i+1) - f(i))/h
! D_- = (f(i) - f(i-1))/h
! then we have D_+D_- = D_-D_+ = (f(i+1) - 2f(i) + f(i-1))/h^2
! for nth order accurate finite difference code, we need r =n/2+1
! D_+^rD_-^r = (D_+D_-)^r
! following the tradiation of PRD 77, 024027 (BB's calibration paper, Eq.(64),
! correct some typo according to above book) :
! + eps*(-1)^(r-1)*h^(2r-1)/2^(2r)*(D_+D_-)^r
!
!
! this is for 8th order accurate finite difference scheme
! (D_+D_-)^5 =
! f(i-5) - 10 f(i-4) + 45 f(i-3) - 120 f(i-2) + 210 f(i-1) - 252 f(i) + 210 f(i+1) - 120 f(i+2) + 45 f(i+3) - 10 f(i+4) + f(i+5)
! -------------------------------------------------------------------------------------------------------------------------------
! dx^10
!---------------------------------------------------------------------------------------------------------------------------------
! do not add dissipation near boundary
subroutine kodis(ex,X,Y,Z,f,f_rhs,SoA,Symmetry,eps)
implicit none
! argument variables
integer,intent(in) :: Symmetry
integer,dimension(3),intent(in)::ex
real*8, dimension(1:3), intent(in) :: SoA
double precision,intent(in),dimension(ex(1))::X
double precision,intent(in),dimension(ex(2))::Y
double precision,intent(in),dimension(ex(3))::Z
double precision,intent(in),dimension(ex(1),ex(2),ex(3))::f
double precision,intent(inout),dimension(ex(1),ex(2),ex(3))::f_rhs
real*8,intent(in) :: eps
!~~~~~~ other variables
real*8 :: dX,dY,dZ
real*8,dimension(-4:ex(1),-4:ex(2),-4:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8,parameter :: cof = 1.024d3 ! 2^2r = 2^10
real*8, parameter :: F10=1.d1,F45=4.5d1,F120=1.2d2,F210=2.1d2,F252=2.52d2
integer::i,j,k
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -4
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -4
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -4
call symmetry_bd(5,ex,f,fh,SoA)
! f(i-5) - 10 f(i-4) + 45 f(i-3) - 120 f(i-2) + 210 f(i-1) - 252 f(i) + 210 f(i+1) - 120 f(i+2) + 45 f(i+3) - 10 f(i+4) + f(i+5)
! -------------------------------------------------------------------------------------------------------------------------------
! dx^10
! note the sign (-1)^r-1, now r=5
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i>imin+4 .and. i < imax-4 .and. &
j>jmin+4 .and. j < jmax-4 .and. &
k>kmin+4 .and. k < kmax-4) then
! x direction
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dX/cof * ( &
(fh(i-5,j,k)+fh(i+5,j,k)) &
- F10 * (fh(i-4,j,k)+fh(i+4,j,k)) &
+ F45 * (fh(i-3,j,k)+fh(i+3,j,k)) &
- F120* (fh(i-2,j,k)+fh(i+2,j,k)) &
+ F210* (fh(i-1,j,k)+fh(i+1,j,k)) &
- F252 * fh(i,j,k) )
! y direction
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dY/cof * ( &
(fh(i,j-5,k)+fh(i,j+5,k)) &
- F10 * (fh(i,j-4,k)+fh(i,j+4,k)) &
+ F45 * (fh(i,j-3,k)+fh(i,j+3,k)) &
- F120* (fh(i,j-2,k)+fh(i,j+2,k)) &
+ F210* (fh(i,j-1,k)+fh(i,j+1,k)) &
- F252 * fh(i,j,k) )
! z direction
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dZ/cof * ( &
(fh(i,j,k-5)+fh(i,j,k+5)) &
- F10 * (fh(i,j,k-4)+fh(i,j,k+4)) &
+ F45 * (fh(i,j,k-3)+fh(i,j,k+3)) &
- F120* (fh(i,j,k-2)+fh(i,j,k+2)) &
+ F210* (fh(i,j,k-1)+fh(i,j,k+1)) &
- F252 * fh(i,j,k) )
endif
enddo
enddo
enddo
return
end subroutine kodis
#endif

View File

@@ -7,7 +7,163 @@
! Vertex or Cell is distinguished in routine symmetry_bd which locates in ! Vertex or Cell is distinguished in routine symmetry_bd which locates in
! file "fmisc.f90" ! file "fmisc.f90"
#if (ghost_width == 2)
! second order code
!-----------------------------------------------------------------------------
! v
! D f = ------[ - 3 f + 4 f - f ]
! i 2dx i i+v i+2v
!
! where
!
! i
! |B |
! v = -----
! i
! B
!
!-----------------------------------------------------------------------------
subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
!~~~~~~> local variables:
! note index -1,0, so we have 2 extra points
real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0,TWO=2.d0,THR=3.d0,FOUR=4.d0
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
call symmetry_bd(2,ex,f,fh,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
if(Sfx(i,j,k) >= ZEO)then
if( i+2 <= imax .and. i >= imin)then
! v
! D f = ------[ - 3 f + 4 f - f ]
! i 2dx i i+v i+2v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d2dx*(-THR*fh(i,j,k)+FOUR*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax .and. i >= imin)then
! v
! D f = ------[ - f + f ]
! i dx i i+v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d2dx*(-fh(i,j,k)+fh(i+1,j,k))
endif
elseif(Sfx(i,j,k) <= ZEO)then
if( i-2 >= imin .and. i <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d2dx*(-THR*fh(i,j,k)+FOUR*fh(i-1,j,k)-fh(i-2,j,k))
elseif(i-1 >= imin .and. i <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d2dx*(-fh(i,j,k)+fh(i-1,j,k))
endif
! set imax and imin 0
endif
! y direction
if(Sfy(i,j,k) >= ZEO)then
if( j+2 <= jmax .and. j >= jmin)then
! v
! D f = ------[ - 3 f + 4 f - f ]
! i 2dx i i+v i+2v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d2dy*(-THR*fh(i,j,k)+FOUR*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax .and. j >= jmin)then
! v
! D f = ------[ - f + f ]
! i dx i i+v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d2dy*(-fh(i,j,k)+fh(i,j+1,k))
endif
elseif(Sfy(i,j,k) <= ZEO)then
if( j-2 >= jmin .and. j <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d2dy*(-THR*fh(i,j,k)+FOUR*fh(i,j-1,k)-fh(i,j-2,k))
elseif(j-1 >= jmin .and. j <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d2dy*(-fh(i,j,k)+fh(i,j-1,k))
endif
! set jmin and jmax 0
endif
!! z direction
if(Sfz(i,j,k) >= ZEO)then
if( k+2 <= kmax .and. k >= kmin)then
! v
! D f = ------[ - 3 f + 4 f - f ]
! i 2dx i i+v i+2v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d2dz*(-THR*fh(i,j,k)+FOUR*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax .and. k >= kmin)then
! v
! D f = ------[ - f + f ]
! i dx i i+v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d2dz*(-fh(i,j,k)+fh(i,j,k+1))
endif
elseif(Sfz(i,j,k) <= ZEO)then
if( k-2 >= kmin .and. k <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d2dz*(-THR*fh(i,j,k)+FOUR*fh(i,j,k-1)-fh(i,j,k-2))
elseif(k-1 >= kmin .and. k <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d2dz*(-fh(i,j,k)+fh(i,j,k-1))
endif
! set kmin and kmax 0
endif
enddo
enddo
enddo
return
end subroutine lopsided
#elif (ghost_width == 3)
! fourth order code ! fourth order code
!----------------------------------------------------------------------------- !-----------------------------------------------------------------------------
@@ -80,7 +236,89 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
do k=1,ex(3)-1 do k=1,ex(3)-1
do j=1,ex(2)-1 do j=1,ex(2)-1
do i=1,ex(1)-1 do i=1,ex(1)-1
#if 0
!! old code
! x direction
if(Sfx(i,j,k) >= ZEO .and. i+3 <= imax .and. i-1 >= imin)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
elseif(Sfx(i,j,k) <= ZEO .and. i-3 >= imin .and. i+1 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
elseif(i+2 <= imax .and. i-2 >= imin)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax .and. i-1 >= imin)then
!
! - f(i-1) + f(i+1)
! fx(i) = --------------------------------
! 2 dx
f_rhs(i,j,k)=f_rhs(i,j,k) + Sfx(i,j,k)*d2dx*(-fh(i-1,j,k)+fh(i+1,j,k))
! set imax and imin 0
endif
! y direction
if(Sfy(i,j,k) >= ZEO .and. j+3 <= jmax .and. j-1 >= jmin)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
elseif(Sfy(i,j,k) <= ZEO .and. j-3 >= jmin .and. j+1 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
elseif(j+2 <= jmax .and. j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax .and. j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k) + Sfy(i,j,k)*d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
! set jmin and jmax 0
endif
!! z direction
if(Sfz(i,j,k) >= ZEO .and. k+3 <= kmax .and. k-1 >= kmin)then
! v
! D f = ------[ - 3f - 10f + 18f - 6f + f ]
! i 12dx i-v i i+v i+2v i+3v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
elseif(Sfz(i,j,k) <= ZEO .and. k-3 >= kmin .and. k+1 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
elseif(k+2 <= kmax .and. k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax .and. k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+Sfz(i,j,k)*d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
! set kmin and kmax 0
endif
#else
!! new code, 2012dec27, based on bam !! new code, 2012dec27, based on bam
! x direction ! x direction
if(Sfx(i,j,k) > ZEO)then if(Sfx(i,j,k) > ZEO)then
@@ -240,6 +478,7 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! set kmax and kmin 0 ! set kmax and kmin 0
endif endif
endif endif
#endif
enddo enddo
enddo enddo
enddo enddo
@@ -247,3 +486,612 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
return return
end subroutine lopsided end subroutine lopsided
!-----------------------------------------------------------------------------
! Combined advection (lopsided) + Kreiss-Oliger dissipation (kodis)
! Shares the symmetry_bd buffer fh, eliminating one full-grid copy per call.
! Mathematically identical to calling lopsided then kodis separately.
!-----------------------------------------------------------------------------
subroutine lopsided_kodis(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA,eps)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
real*8,intent(in) :: eps
!~~~~~~> local variables:
! note index -2,-1,0, so we have 3 extra points
real*8,dimension(-2:ex(1),-2:ex(2),-2:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F3=3.d0
real*8, parameter :: TWO=2.d0,F6=6.0d0,F18=1.8d1
real*8, parameter :: F12=1.2d1, F10=1.d1,EIT=8.d0
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
! kodis parameters
real*8, parameter :: SIX=6.d0,FIT=1.5d1,TWT=2.d1
real*8, parameter :: cof=6.4d1 ! 2^6
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -2
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -2
! Single symmetry_bd call shared by both advection and dissipation
call symmetry_bd(3,ex,f,fh,SoA)
! ---- Advection (lopsided) loop ----
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
if(Sfx(i,j,k) > ZEO)then
if(i+3 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
elseif(i+2 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
endif
elseif(Sfx(i,j,k) < ZEO)then
if(i-3 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
elseif(i-2 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i-1 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
endif
endif
! y direction
if(Sfy(i,j,k) > ZEO)then
if(j+3 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
elseif(j+2 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
endif
elseif(Sfy(i,j,k) < ZEO)then
if(j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
elseif(j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
endif
endif
! z direction
if(Sfz(i,j,k) > ZEO)then
if(k+3 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
elseif(k+2 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
endif
elseif(Sfz(i,j,k) < ZEO)then
if(k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
elseif(k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
endif
endif
enddo
enddo
enddo
! ---- Dissipation (kodis) loop ----
if(eps > ZEO) then
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i-3 >= imin .and. i+3 <= imax .and. &
j-3 >= jmin .and. j+3 <= jmax .and. &
k-3 >= kmin .and. k+3 <= kmax) then
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )/dX + &
( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )/dY + &
( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )/dZ )
endif
enddo
enddo
enddo
endif
return
end subroutine lopsided_kodis
#elif (ghost_width == 4)
! sixth order code
! Compute advection terms in right hand sides of field equations
! v
! D f = ------[ 2f - 24f - 35f + 80f - 30f + 8f - f ]
! i 60dx i-2v i-v i i+v i+2v i+3v i+4v
!
! where
!
! i
! |B |
! v = -----
! i
! B
!
!-----------------------------------------------------------------------------
subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
!~~~~~~> local variables:
real*8,dimension(-3:ex(1),-3:ex(2),-3:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d60dx,d60dy,d60dz,d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F60=6.d1
real*8, parameter :: TWO=2.d0,F24=2.4d1,F35=3.5d1,F80=8.d1,F30=3.d1,EIT=8.d0
real*8, parameter :: F9=9.d0,F45=4.5d1,F12=1.2d1
real*8, parameter :: F10=1.d1,F77=7.7d1,F150=1.5d2,F100=1.d2,F50=5.d1,F15=1.5d1
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d60dx = ONE/F60/dX
d60dy = ONE/F60/dY
d60dz = ONE/F60/dZ
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -3
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -3
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -3
call symmetry_bd(4,ex,f,fh,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
if(Sfx(i,j,k) >= ZEO .and. i+4 <= imax .and. i-2 >= imin)then
! v
! D f = ------[ 2f - 24f - 35f + 80f - 30f + 8f - f ]
! i 60dx i-2v i-v i i+v i+2v i+3v i+4v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d60dx*(TWO*fh(i-2,j,k)-F24*fh(i-1,j,k)-F35*fh(i,j,k)+F80*fh(i+1,j,k) &
-F30*fh(i+2,j,k)+EIT*fh(i+3,j,k)- fh(i+4,j,k))
elseif(Sfx(i,j,k) >= ZEO .and. i+5 <= imax .and. i-1 >= imin)then
! v
! D f = ------[-10f - 77f + 150f - 100f + 50f -15f + 2f ]
! i 60dx i-v i i+v i+2v i+3v i+4v i+5v
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d60dx*(-F10*fh(i-1,j,k)-F77*fh(i ,j,k)+F150*fh(i+1,j,k)-F100*fh(i+2,j,k) &
+F50*fh(i+3,j,k)-F15*fh(i+4,j,k)+ TWO*fh(i+5,j,k))
elseif(Sfx(i,j,k) <= ZEO .and. i-4 >= imin .and. i+2 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d60dx*(TWO*fh(i+2,j,k)-F24*fh(i+1,j,k)-F35*fh(i,j,k)+F80*fh(i-1,j,k) &
-F30*fh(i-2,j,k)+EIT*fh(i-3,j,k)- fh(i-4,j,k))
elseif(Sfx(i,j,k) <= ZEO .and. i-5 >= imin .and. i+1 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d60dx*(-F10*fh(i+1,j,k)-F77*fh(i ,j,k)+F150*fh(i-1,j,k)-F100*fh(i-2,j,k) &
+F50*fh(i-3,j,k)-F15*fh(i-4,j,k)+ TWO*fh(i-5,j,k))
elseif(i+3 <= imax .and. i-3 >= imin)then
! - f(i-3) + 9 f(i-2) - 45 f(i-1) + 45 f(i+1) - 9 f(i+2) + f(i+3)
! fx(i) = -----------------------------------------------------------------
! 60 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d60dx*(-fh(i-3,j,k)+F9*fh(i-2,j,k)-F45*fh(i-1,j,k)+F45*fh(i+1,j,k)-F9*fh(i+2,j,k)+fh(i+3,j,k))
elseif(i+2 <= imax .and. i-2 >= imin)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax .and. i-1 >= imin)then
!
! - f(i-1) + f(i+1)
! fx(i) = --------------------------------
! 2 dx
f_rhs(i,j,k)=f_rhs(i,j,k) + Sfx(i,j,k)*d2dx*(-fh(i-1,j,k)+fh(i+1,j,k))
! set imax and imin 0
endif
! y direction
if(Sfy(i,j,k) >= ZEO .and. j+4 <= jmax .and. j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d60dy*(TWO*fh(i,j-2,k)-F24*fh(i,j-1,k)-F35*fh(i,j,k)+F80*fh(i,j+1,k) &
-F30*fh(i,j+2,k)+EIT*fh(i,j+3,k)- fh(i,j+4,k))
elseif(Sfy(i,j,k) >= ZEO .and. j+5 <= jmax .and. j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d60dy*(-F10*fh(i,j-1,k)-F77*fh(i,j ,k)+F150*fh(i,j+1,k)-F100*fh(i,j+2,k) &
+F50*fh(i,j+3,k)-F15*fh(i,j+4,k)+ TWO*fh(i,j+5,k))
elseif(Sfy(i,j,k) <= ZEO .and. j-4 >= jmin .and. j+2 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d60dy*(TWO*fh(i,j+2,k)-F24*fh(i,j+1,k)-F35*fh(i,j,k)+F80*fh(i,j-1,k) &
-F30*fh(i,j-2,k)+EIT*fh(i,j-3,k)- fh(i,j-4,k))
elseif(Sfy(i,j,k) <= ZEO .and. j-5 >= jmin .and. j+1 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d60dy*(-F10*fh(i,j+1,k)-F77*fh(i,j ,k)+F150*fh(i,j-1,k)-F100*fh(i,j-2,k) &
+F50*fh(i,j-3,k)-F15*fh(i,j-4,k)+ TWO*fh(i,j-5,k))
elseif(j+3 <= jmax .and. j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d60dy*(-fh(i,j-3,k)+F9*fh(i,j-2,k)-F45*fh(i,j-1,k)+F45*fh(i,j+1,k)-F9*fh(i,j+2,k)+fh(i,j+3,k))
elseif(j+2 <= jmax .and. j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax .and. j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k) + Sfy(i,j,k)*d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
! set jmin and jmax 0
endif
!! z direction
if(Sfz(i,j,k) >= ZEO .and. k+4 <= kmax .and. k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d60dz*(TWO*fh(i,j,k-2)-F24*fh(i,j,k-1)-F35*fh(i,j,k)+F80*fh(i,j,k+1) &
-F30*fh(i,j,k+2)+EIT*fh(i,j,k+3)- fh(i,j,k+4))
elseif(Sfz(i,j,k) >= ZEO .and. k+5 <= kmax .and. k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d60dz*(-F10*fh(i,j,k-1)-F77*fh(i,j,k )+F150*fh(i,j,k+1)-F100*fh(i,j,k+2) &
+F50*fh(i,j,k+3)-F15*fh(i,j,k+4)+ TWO*fh(i,j,k+5))
elseif(Sfz(i,j,k) <= ZEO .and. k-4 >= kmin .and. k+2 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d60dz*(TWO*fh(i,j,k+2)-F24*fh(i,j,k+1)-F35*fh(i,j,k)+F80*fh(i,j,k-1) &
-F30*fh(i,j,k-2)+EIT*fh(i,j,k-3)- fh(i,j,k-4))
elseif(Sfz(i,j,k) <= ZEO .and. k-5 >= kmin .and. k+1 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d60dz*(-F10*fh(i,j,k+1)-F77*fh(i,j,k )+F150*fh(i,j,k-1)-F100*fh(i,j,k-2) &
+F50*fh(i,j,k-3)-F15*fh(i,j,k-4)+ TWO*fh(i,j,k-5))
elseif(k+3 <= kmax .and. k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d60dz*(-fh(i,j,k-3)+F9*fh(i,j,k-2)-F45*fh(i,j,k-1)+F45*fh(i,j,k+1)-F9*fh(i,j,k+2)+fh(i,j,k+3))
elseif(k+2 <= kmax .and. k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax .and. k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+Sfz(i,j,k)*d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
! set kmin and kmax 0
endif
enddo
enddo
enddo
return
end subroutine lopsided
#elif (ghost_width == 5)
! eighth order code
!-----------------------------------------------------------------------------
! PRD 77, 024034 (2008)
! Compute advection terms in right hand sides of field equations
! v [ - 5 f(i-3v) + 60 f(i-2v) - 420 f(i-v) - 378 f(i) + 1050 f(i+v) - 420 f(i+2v) + 140 f(i+3v) - 30 f(i+4v) + 3 f(i+5v)]
! D f = --------------------------------------------------------------------------------------------------------------------------
! i 840 dx
!
! where
!
! i
! |B |
! v = -----
! i
! B
!
!-----------------------------------------------------------------------------
subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
!~~~~~~> local variables:
real*8,dimension(-4:ex(1),-4:ex(2),-4:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d840dx,d840dy,d840dz,d60dx,d60dy,d60dz,d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F60=6.d1
real*8, parameter :: TWO=2.d0,F30=3.d1,EIT=8.d0
real*8, parameter :: F9=9.d0,F45=4.5d1,F12=1.2d1,F140=1.4d2,THR=3.d0
real*8, parameter :: F840=8.4d2,F5=5.d0,F420=4.2d2,F378=3.78d2,F1050=1.05d3
real*8, parameter :: F32=3.2d1,F168=1.68d2,F672=6.72d2
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d840dx = ONE/F840/dX
d840dy = ONE/F840/dY
d840dz = ONE/F840/dZ
d60dx = ONE/F60/dX
d60dy = ONE/F60/dY
d60dz = ONE/F60/dZ
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -4
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -4
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -4
call symmetry_bd(5,ex,f,fh,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
if(Sfx(i,j,k) >= ZEO .and. i+5 <= imax .and. i-3 >= imin)then
! v [ - 5 f(i-3v) + 60 f(i-2v) - 420 f(i-v) - 378 f(i) + 1050 f(i+v) - 420 f(i+2v) + 140 f(i+3v) - 30 f(i+4v) + 3 f(i+5v)]
! D f = --------------------------------------------------------------------------------------------------------------------------
! i 840 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d840dx*(-F5*fh(i-3,j,k)+F60 *fh(i-2,j,k)-F420*fh(i-1,j,k)-F378*fh(i ,j,k) &
+F1050*fh(i+1,j,k)-F420*fh(i+2,j,k)+F140*fh(i+3,j,k)-F30 *fh(i+4,j,k)+THR*fh(i+5,j,k))
elseif(Sfx(i,j,k) <= ZEO .and. i-5 >= imin .and. i+3 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d840dx*(-F5*fh(i+3,j,k)+F60 *fh(i+2,j,k)-F420*fh(i+1,j,k)-F378*fh(i ,j,k) &
+F1050*fh(i-1,j,k)-F420*fh(i-2,j,k)+F140*fh(i-3,j,k)- F30*fh(i-4,j,k)+THR*fh(i-5,j,k))
elseif(i+4 <= imax .and. i-4 >= imin)then
! 3 f(i-4) - 32 f(i-3) + 168 f(i-2) - 672 f(i-1) + 672 f(i+1) - 168 f(i+2) + 32 f(i+3) - 3 f(i+4)
! fx(i) = -------------------------------------------------------------------------------------------------
! 840 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d840dx*( THR*fh(i-4,j,k)-F32 *fh(i-3,j,k)+F168*fh(i-2,j,k)-F672*fh(i-1,j,k)+ &
F672*fh(i+1,j,k)-F168*fh(i+2,j,k)+F32 *fh(i+3,j,k)-THR *fh(i+4,j,k))
elseif(i+3 <= imax .and. i-3 >= imin)then
! - f(i-3) + 9 f(i-2) - 45 f(i-1) + 45 f(i+1) - 9 f(i+2) + f(i+3)
! fx(i) = -----------------------------------------------------------------
! 60 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d60dx*(-fh(i-3,j,k)+F9*fh(i-2,j,k)-F45*fh(i-1,j,k)+F45*fh(i+1,j,k)-F9*fh(i+2,j,k)+fh(i+3,j,k))
elseif(i+2 <= imax .and. i-2 >= imin)then
!
! f(i-2) - 8 f(i-1) + 8 f(i+1) - f(i+2)
! fx(i) = ---------------------------------------------
! 12 dx
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax .and. i-1 >= imin)then
!
! - f(i-1) + f(i+1)
! fx(i) = --------------------------------
! 2 dx
f_rhs(i,j,k)=f_rhs(i,j,k) + Sfx(i,j,k)*d2dx*(-fh(i-1,j,k)+fh(i+1,j,k))
! set imax and imin 0
endif
! y direction
if(Sfy(i,j,k) >= ZEO .and. j+5 <= jmax .and. j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d840dy*(-F5*fh(i,j-3,k)+F60 *fh(i,j-2,k)-F420*fh(i,j-1,k)-F378*fh(i,j ,k) &
+F1050*fh(i,j+1,k)-F420*fh(i,j+2,k)+F140*fh(i,j+3,k)-F30 *fh(i,j+4,k)+THR*fh(i,j+5,k))
elseif(Sfy(i,j,k) <= ZEO .and. j-5 >= jmin .and. j+3 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d840dy*(-F5*fh(i,j+3,k)+F60 *fh(i,j+2,k)-F420*fh(i,j+1,k)-F378*fh(i,j ,k) &
+F1050*fh(i,j-1,k)-F420*fh(i,j-2,k)+F140*fh(i,j-3,k)- F30*fh(i,j-4,k)+THR*fh(i,j-5,k))
elseif(j+4 <= jmax .and. j-4 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d840dy*( THR*fh(i,j-4,k)-F32 *fh(i,j-3,k)+F168*fh(i,j-2,k)-F672*fh(i,j-1,k)+ &
F672*fh(i,j+1,k)-F168*fh(i,j+2,k)+F32 *fh(i,j+3,k)-THR *fh(i,j+4,k))
elseif(j+3 <= jmax .and. j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d60dy*(-fh(i,j-3,k)+F9*fh(i,j-2,k)-F45*fh(i,j-1,k)+F45*fh(i,j+1,k)-F9*fh(i,j+2,k)+fh(i,j+3,k))
elseif(j+2 <= jmax .and. j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax .and. j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k) + Sfy(i,j,k)*d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
! set jmin and jmax 0
endif
!! z direction
if(Sfz(i,j,k) >= ZEO .and. k+5 <= kmax .and. k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d840dz*(-F5*fh(i,j,k-3)+F60 *fh(i,j,k-2)-F420*fh(i,j,k-1)-F378*fh(i,j,k ) &
+F1050*fh(i,j,k+1)-F420*fh(i,j,k+2)+F140*fh(i,j,k+3)-F30 *fh(i,j,k+4)+THR*fh(i,j,k+5))
elseif(Sfz(i,j,k) <= ZEO .and. k-5 >= kmin .and. k+3 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d840dz*(-F5*fh(i,j,k+3)+F60 *fh(i,j,k+2)-F420*fh(i,j,k+1)-F378*fh(i,j,k ) &
+F1050*fh(i,j,k-1)-F420*fh(i,j,k-2)+F140*fh(i,j,k-3)- F30*fh(i,j,k-4)+THR*fh(i,j,k-5))
elseif(k+4 <= kmax .and. k-4 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d840dz*( THR*fh(i,j,k-4)-F32 *fh(i,j,k-3)+F168*fh(i,j,k-2)-F672*fh(i,j,k-1)+ &
F672*fh(i,j,k+1)-F168*fh(i,j,k+2)+F32 *fh(i,j,k+3)-THR *fh(i,j,k+4))
elseif(k+3 <= kmax .and. k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d60dz*(-fh(i,j,k-3)+F9*fh(i,j,k-2)-F45*fh(i,j,k-1)+F45*fh(i,j,k+1)-F9*fh(i,j,k+2)+fh(i,j,k+3))
elseif(k+2 <= kmax .and. k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax .and. k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+Sfz(i,j,k)*d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
! set kmin and kmax 0
endif
enddo
enddo
enddo
return
end subroutine lopsided
#endif

View File

@@ -10,14 +10,15 @@ filein = -I/usr/include/ -I${MKLROOT}/include
## Added -lifcore for Intel Fortran runtime and -limf for Intel math library ## Added -lifcore for Intel Fortran runtime and -limf for Intel math library
LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore -limf -lpthread -lm -ldl LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore -limf -lpthread -lm -ldl
## Aggressive optimization flags: ## Aggressive optimization flags + PGO Phase 2 (profile-guided optimization)
## -O3: Maximum optimization ## -fprofile-instr-use: use collected profile data to guide optimization decisions
## -xHost: Optimize for the host CPU architecture (Intel/AMD compatible) ## (branch prediction, basic block layout, inlining, loop unrolling)
## -fp-model fast=2: Aggressive floating-point optimizations PROFDATA = ../../pgo_profile/default.profdata
## -fma: Enable fused multiply-add instructions
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \ CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=$(PROFDATA) \
-Dfortran3 -Dnewc -I${MKLROOT}/include -Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \ f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=$(PROFDATA) \
-align array64byte -fpp -I${MKLROOT}/include -align array64byte -fpp -I${MKLROOT}/include
f90 = ifx f90 = ifx
f77 = ifx f77 = ifx

File diff suppressed because it is too large Load Diff

View File

@@ -15,12 +15,13 @@ import time
## taskset ensures all child processes inherit the CPU affinity mask ## taskset ensures all child processes inherit the CPU affinity mask
## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111) ## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111)
## Format: taskset -c 4-55,60-111 ensures processes only run on these cores ## Format: taskset -c 4-55,60-111 ensures processes only run on these cores
NUMACTL_CPU_BIND = "taskset -c 0-111" #NUMACTL_CPU_BIND = "taskset -c 0-111"
NUMACTL_CPU_BIND = "taskset -c 16-47,64-95"
## Build parallelism configuration ## Build parallelism configuration
## Use nohz_full cores (4-55, 60-111) for compilation: 52 + 52 = 104 cores ## Use nohz_full cores (4-55, 60-111) for compilation: 52 + 52 = 104 cores
## Set make -j to utilize available cores for faster builds ## Set make -j to utilize available cores for faster builds
BUILD_JOBS = 104 BUILD_JOBS = 96
################################################################## ##################################################################
@@ -117,6 +118,7 @@ def run_ABE():
if (input_data.GPU_Calculation == "no"): if (input_data.GPU_Calculation == "no"):
mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABE" mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
#mpi_command = " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
mpi_command_outfile = "ABE_out.log" mpi_command_outfile = "ABE_out.log"
elif (input_data.GPU_Calculation == "yes"): elif (input_data.GPU_Calculation == "yes"):
mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABEGPU" mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABEGPU"
@@ -158,7 +160,8 @@ def run_TwoPunctureABE():
print( ) print( )
## Define the command to run ## Define the command to run
TwoPuncture_command = NUMACTL_CPU_BIND + " ./TwoPunctureABE" #TwoPuncture_command = NUMACTL_CPU_BIND + " ./TwoPunctureABE"
TwoPuncture_command = " ./TwoPunctureABE"
TwoPuncture_command_outfile = "TwoPunctureABE_out.log" TwoPuncture_command_outfile = "TwoPunctureABE_out.log"
## Execute the command with subprocess.Popen and stream output ## Execute the command with subprocess.Popen and stream output

29
parallel_plot_helper.py Normal file
View File

@@ -0,0 +1,29 @@
import multiprocessing
def run_plot_task(task):
"""Execute a single plotting task.
Parameters
----------
task : tuple
A tuple of (function, args_tuple) where function is a callable
plotting function and args_tuple contains its arguments.
"""
func, args = task
return func(*args)
def run_plot_tasks_parallel(plot_tasks):
"""Execute a list of independent plotting tasks in parallel.
Uses the 'fork' context to create worker processes so that the main
script is NOT re-imported/re-executed in child processes.
Parameters
----------
plot_tasks : list of tuples
Each element is (function, args_tuple).
"""
ctx = multiprocessing.get_context('fork')
with ctx.Pool() as pool:
pool.map(run_plot_task, plot_tasks)

View File

@@ -0,0 +1,97 @@
# AMSS-NCKU PGO Profile Analysis Report
## 1. Profiling Environment
| Item | Value |
|------|-------|
| Compiler | Intel oneAPI DPC++/C++ 2025.3.0 (icpx/ifx) |
| Instrumentation Flag | `-fprofile-instr-generate` |
| Optimization Level (instrumented) | `-O2 -xHost -fma` |
| MPI Processes | 1 (single process to avoid MPI+instrumentation deadlock) |
| Profile File | `default_9725750769337483397_0.profraw` (327 KB) |
| Merged Profile | `default.profdata` (394 KB) |
| llvm-profdata | `/home/intel/oneapi/compiler/2025.3/bin/compiler/llvm-profdata` |
## 2. Reduced Simulation Parameters (for profiling run)
| Parameter | Production Value | Profiling Value |
|-----------|-----------------|-----------------|
| MPI_processes | 64 | 1 |
| grid_level | 9 | 4 |
| static_grid_level | 5 | 3 |
| static_grid_number | 96 | 24 |
| moving_grid_number | 48 | 16 |
| largest_box_xyz_max | 320^3 | 160^3 |
| Final_Evolution_Time | 1000.0 | 10.0 |
| Evolution_Step_Number | 10,000,000 | 1,000 |
| Detector_Number | 12 | 2 |
## 3. Profile Summary
| Metric | Value |
|--------|-------|
| Total instrumented functions | 1,392 |
| Functions with non-zero counts | 117 (8.4%) |
| Functions with zero counts | 1,275 (91.6%) |
| Maximum function entry count | 386,459,248 |
| Maximum internal block count | 370,477,680 |
| Total block count | 4,198,023,118 |
## 4. Top 20 Hotspot Functions
| Rank | Total Count | Max Block Count | Function | Category |
|------|------------|-----------------|----------|----------|
| 1 | 1,241,601,732 | 370,477,680 | `polint_` | Interpolation |
| 2 | 755,994,435 | 230,156,640 | `prolong3_` | Grid prolongation |
| 3 | 667,964,095 | 3,697,792 | `compute_rhs_bssn_` | BSSN RHS evolution |
| 4 | 539,736,051 | 386,459,248 | `symmetry_bd_` | Symmetry boundary |
| 5 | 277,310,808 | 53,170,728 | `lopsided_` | Lopsided FD stencil |
| 6 | 155,534,488 | 94,535,040 | `decide3d_` | 3D grid decision |
| 7 | 119,267,712 | 19,266,048 | `rungekutta4_rout_` | RK4 time integrator |
| 8 | 91,574,616 | 48,824,160 | `kodis_` | Kreiss-Oliger dissipation |
| 9 | 67,555,389 | 43,243,680 | `fderivs_` | Finite differences |
| 10 | 55,296,000 | 42,246,144 | `misc::fact(int)` | Factorial utility |
| 11 | 43,191,071 | 27,663,328 | `fdderivs_` | 2nd-order FD derivatives |
| 12 | 36,233,965 | 22,429,440 | `restrict3_` | Grid restriction |
| 13 | 24,698,512 | 17,231,520 | `polin3_` | Polynomial interpolation |
| 14 | 22,962,942 | 20,968,768 | `copy_` | Data copy |
| 15 | 20,135,696 | 17,259,168 | `Ansorg::barycentric(...)` | Spectral interpolation |
| 16 | 14,650,224 | 7,224,768 | `Ansorg::barycentric_omega(...)` | Spectral weights |
| 17 | 13,242,296 | 2,871,920 | `global_interp_` | Global interpolation |
| 18 | 12,672,000 | 7,734,528 | `sommerfeld_rout_` | Sommerfeld boundary |
| 19 | 6,872,832 | 1,880,064 | `sommerfeld_routbam_` | Sommerfeld boundary (BAM) |
| 20 | 5,709,900 | 2,809,632 | `l2normhelper_` | L2 norm computation |
## 5. Hotspot Category Breakdown
Top 20 functions account for ~98% of total execution counts:
| Category | Functions | Combined Count | Share |
|----------|-----------|---------------|-------|
| Interpolation / Prolongation / Restriction | polint_, prolong3_, restrict3_, polin3_, global_interp_, Ansorg::* | ~2,093M | ~50% |
| BSSN RHS + FD stencils | compute_rhs_bssn_, lopsided_, fderivs_, fdderivs_ | ~1,056M | ~25% |
| Boundary conditions | symmetry_bd_, sommerfeld_rout_, sommerfeld_routbam_ | ~559M | ~13% |
| Time integration | rungekutta4_rout_ | ~119M | ~3% |
| Dissipation | kodis_ | ~92M | ~2% |
| Utilities | misc::fact, decide3d_, copy_, l2normhelper_ | ~256M | ~6% |
## 6. Conclusions
1. **Profile data is valid**: 1,392 functions instrumented, 117 exercised with ~4.2 billion total counts.
2. **Hotspot concentration is high**: Top 5 functions alone account for ~76% of all counts, which is ideal for PGO — the compiler has strong branch/layout optimization targets.
3. **Fortran numerical kernels dominate**: `polint_`, `prolong3_`, `compute_rhs_bssn_`, `symmetry_bd_`, `lopsided_` are all Fortran routines in the inner evolution loop. PGO will optimize their branch prediction and basic block layout.
4. **91.6% of functions have zero counts**: These are code paths for unused features (GPU, BSSN-EScalar, BSSN-EM, Z4C, etc.). PGO will deprioritize them, improving instruction cache utilization.
5. **Profile is representative**: Despite the reduced grid size, the code path coverage matches production — the same kernels (RHS, prolongation, restriction, boundary) are exercised. PGO branch probabilities from this profile will transfer well to full-scale runs.
## 7. PGO Phase 2 Usage
To apply the profile, use the following flags in `makefile.inc`:
```makefile
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=/home/amss/AMSS-NCKU/pgo_profile/default.profdata \
-Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=/home/amss/AMSS-NCKU/pgo_profile/default.profdata \
-align array64byte -fpp -I${MKLROOT}/include
```

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@@ -11,6 +11,8 @@
import numpy ## numpy for array operations import numpy ## numpy for array operations
import scipy ## scipy for interpolation and signal processing import scipy ## scipy for interpolation and signal processing
import math import math
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt ## matplotlib for plotting import matplotlib.pyplot as plt ## matplotlib for plotting
import os ## os for system/file operations import os ## os for system/file operations

View File

@@ -8,16 +8,23 @@
## ##
################################################# #################################################
## Restrict OpenMP to one thread per process so that running
## many workers in parallel does not create an O(workers * BLAS_threads)
## thread explosion. The variable MUST be set before numpy/scipy
## are imported, because the BLAS library reads them only at load time.
import os
os.environ.setdefault("OMP_NUM_THREADS", "1")
import numpy import numpy
import scipy import scipy
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm from matplotlib.colors import LogNorm
from mpl_toolkits.mplot3d import Axes3D from mpl_toolkits.mplot3d import Axes3D
## import torch ## import torch
import AMSS_NCKU_Input as input_data import AMSS_NCKU_Input as input_data
import os
######################################################################################### #########################################################################################
@@ -192,3 +199,19 @@ def get_data_xy( Rmin, Rmax, n, data0, time, figure_title, figure_outdir ):
#################################################################################### ####################################################################################
####################################################################################
## Allow this module to be run as a standalone script so that each
## binary-data plot can be executed in a fresh subprocess whose BLAS
## environment variables (set above) take effect before numpy loads.
##
## Usage: python3 plot_binary_data.py <filename> <binary_outdir> <figure_outdir>
####################################################################################
if __name__ == '__main__':
import sys
if len(sys.argv) != 4:
print(f"Usage: {sys.argv[0]} <filename> <binary_outdir> <figure_outdir>")
sys.exit(1)
plot_binary_data(sys.argv[1], sys.argv[2], sys.argv[3])

View File

@@ -8,6 +8,8 @@
################################################# #################################################
import numpy ## numpy for array operations import numpy ## numpy for array operations
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt ## matplotlib for plotting import matplotlib.pyplot as plt ## matplotlib for plotting
from mpl_toolkits.mplot3d import Axes3D ## needed for 3D plots from mpl_toolkits.mplot3d import Axes3D ## needed for 3D plots
import glob import glob
@@ -15,6 +17,9 @@ import os ## operating system utilities
import plot_binary_data import plot_binary_data
import AMSS_NCKU_Input as input_data import AMSS_NCKU_Input as input_data
import subprocess
import sys
import multiprocessing
# plt.rcParams['text.usetex'] = True ## enable LaTeX fonts in plots # plt.rcParams['text.usetex'] = True ## enable LaTeX fonts in plots
@@ -50,10 +55,40 @@ def generate_binary_data_plot( binary_outdir, figure_outdir ):
file_list.append(x) file_list.append(x)
print(x) print(x)
## Plot each file in the list ## Plot each file in parallel using subprocesses.
## Each subprocess is a fresh Python process where the BLAS thread-count
## environment variables (set at the top of plot_binary_data.py) take
## effect before numpy is imported. This avoids the thread explosion
## that occurs when multiprocessing.Pool with 'fork' context inherits
## already-initialized multi-threaded BLAS from the parent.
script = os.path.join( os.path.dirname(__file__), "plot_binary_data.py" )
max_workers = min( multiprocessing.cpu_count(), len(file_list) ) if file_list else 0
running = []
failed = []
for filename in file_list: for filename in file_list:
print(filename) print(filename)
plot_binary_data.plot_binary_data(filename, binary_outdir, figure_outdir) proc = subprocess.Popen(
[sys.executable, script, filename, binary_outdir, figure_outdir],
)
running.append( (proc, filename) )
## Keep at most max_workers subprocesses active at a time
if len(running) >= max_workers:
p, fn = running.pop(0)
p.wait()
if p.returncode != 0:
failed.append(fn)
## Wait for all remaining subprocesses to finish
for p, fn in running:
p.wait()
if p.returncode != 0:
failed.append(fn)
if failed:
print( " WARNING: the following binary data plots failed:" )
for fn in failed:
print( " ", fn )
print( ) print( )
print( " Binary Data Plot Has been Finished " ) print( " Binary Data Plot Has been Finished " )