Compare commits

..

9 Commits

Author SHA1 Message Date
jaunatisblue
f147f79ffa 修改block划分,对负载高的rank所在block进行划分,添加到空rank,空rank是平移得到的 2026-02-26 09:40:46 +08:00
jaunatisblue
8abac8dd88 对rank运行时间统计,两个函数分别在不同的计算中被调用,因此我对两个重载的函数分别进行了mpi实际计算时间的统计,对于第一个PatList_Interp_Points 调用 Interp_points,我取排名前三的rank时间,发现每次只有一个rank时间较长,Rank [ 52]: Calc 0.000012 s
Rank [  20]: Calc 0.000003 s

Rank [  35]: Calc 0.000003 s

Rank [  10]: Calc 0.000010 s

Rank [  17]: Calc 0.000005 s

Rank [  32]: Calc 0.000003 s,而且rank不固定,一般就是rank 10 和 rank 52;
但尽管有很多,比前者时间还是少很多
对于第二个Surf_Wave 调用 Interp_points,我发现前四个rank时间最长,比较固定,就是下面四个rank

Rank [  27]: Calc 0.331978 s

Rank [  35]: Calc 0.242219 s

Rank [  28]: Calc 0.242132 s

Rank [  36]: Calc 0.197024 s
因此下面surf_wave是核心
2026-02-24 14:34:24 +08:00
82339f5282 Merge lopsided advection + kodis dissipation to share symmetry_bd buffer
Cherry-picked from 38c2c30.
2026-02-20 13:36:27 +08:00
94f38c57f9 Don't hardcode pgo profile path 2026-02-20 13:36:27 +08:00
85d1e8de87 Add Intel SIMD vectorization directives to hot-spot functions
Apply Intel Advisor optimization recommendations:
- Add FORCEINLINE to polint for better inlining
- Add SIMD VECTORLENGTHFOR and UNROLL directives to fderivs,
  fdderivs, symmetry_bd, and kodis functions

This improves vectorization efficiency of finite difference
computations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 00:43:39 +08:00
5b7e05cd32 PGO updated 2026-02-11 18:26:30 +08:00
85afe00fc5 Merge plotting optimizations from chb-copilot-test
- Implement multiprocessing-based parallel plotting
- Add parallel_plot_helper.py for concurrent plot task execution
- Use matplotlib 'Agg' backend for multiprocessing safety
- Set OMP_NUM_THREADS=1 to prevent BLAS thread explosion
- Use subprocess for binary data plots to avoid thread conflicts
- Add fork bomb protection in main program

This merge only includes plotting improvements and excludes
MPI communication changes to preserve existing optimizations.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 16:19:17 +08:00
5c1790277b Replace nested OutBdLow2Hi loops with batch calls in RestrictProlong
The 8 nested while(Ppc){while(Pp){OutBdLow2Hi(single,single,...)}}
loops across RestrictProlong (3 overloads) and ProlongRestrict each
produced N_c × N_f separate transfer() → MPI_Waitall barriers.
Replace with the existing batch OutBdLow2Hi(MyList<Patch>*,...) which
merges all patch pairs into a single transfer() call with 1 MPI_Waitall.

Also add Restrict_cached, OutBdLow2Hi_cached, OutBdLow2Himix_cached
to Parallel (unused for now — kept as infrastructure for future use).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-11 16:09:08 +08:00
e09ae438a2 Cache data_packer lengths in Sync_start to skip redundant buffer-size traversals
The data_packer(NULL, ...) calls that compute send/recv buffer lengths
traverse all grid segments × variables × nprocs on every Sync_start
invocation, even though lengths never change once the cache is built.
Add a lengths_valid flag to SyncCache so these length computations are
done once and reused on subsequent calls (4× per RK4 step).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 21:39:22 +08:00
57 changed files with 15218 additions and 14347 deletions

View File

@@ -16,7 +16,7 @@ import numpy
File_directory = "GW150914" ## output file directory
Output_directory = "binary_output" ## binary data file directory
## The file directory name should not be too long
MPI_processes = 1 ## number of processes (MPI removed, single-process mode)
MPI_processes = 64 ## number of mpi processes used in the simulation
GPU_Calculation = "no" ## Use GPU or not
## (prefer "no" in the current version, because the GPU part may have bugs when integrated in this Python interface)

View File

@@ -8,6 +8,14 @@
##
##################################################################
## Guard against re-execution by multiprocessing child processes.
## Without this, using 'spawn' or 'forkserver' context would cause every
## worker to re-run the entire script, spawning exponentially more
## workers (fork bomb).
if __name__ != '__main__':
import sys as _sys
_sys.exit(0)
##################################################################
@@ -58,7 +66,8 @@ if os.path.exists(File_directory):
## Prompt whether to overwrite the existing directory
while True:
try:
inputvalue = input()
## inputvalue = input()
inputvalue = "continue"
## If the user agrees to overwrite, proceed and remove the existing directory
if ( inputvalue == "continue" ):
print( " Continue the calculation !!! " )
@@ -424,26 +433,31 @@ print(
import plot_xiaoqu
import plot_GW_strain_amplitude_xiaoqu
from parallel_plot_helper import run_plot_tasks_parallel
plot_tasks = []
## Plot black hole trajectory
plot_xiaoqu.generate_puncture_orbit_plot( binary_results_directory, figure_directory )
plot_xiaoqu.generate_puncture_orbit_plot3D( binary_results_directory, figure_directory )
plot_tasks.append( ( plot_xiaoqu.generate_puncture_orbit_plot, (binary_results_directory, figure_directory) ) )
plot_tasks.append( ( plot_xiaoqu.generate_puncture_orbit_plot3D, (binary_results_directory, figure_directory) ) )
## Plot black hole separation vs. time
plot_xiaoqu.generate_puncture_distence_plot( binary_results_directory, figure_directory )
plot_tasks.append( ( plot_xiaoqu.generate_puncture_distence_plot, (binary_results_directory, figure_directory) ) )
## Plot gravitational waveforms (psi4 and strain amplitude)
for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_gravitational_wave_psi4_plot( binary_results_directory, figure_directory, i )
plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot( binary_results_directory, figure_directory, i )
plot_tasks.append( ( plot_xiaoqu.generate_gravitational_wave_psi4_plot, (binary_results_directory, figure_directory, i) ) )
plot_tasks.append( ( plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot, (binary_results_directory, figure_directory, i) ) )
## Plot ADM mass evolution
for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_ADMmass_plot( binary_results_directory, figure_directory, i )
plot_tasks.append( ( plot_xiaoqu.generate_ADMmass_plot, (binary_results_directory, figure_directory, i) ) )
## Plot Hamiltonian constraint violation over time
for i in range(input_data.grid_level):
plot_xiaoqu.generate_constraint_check_plot( binary_results_directory, figure_directory, i )
plot_tasks.append( ( plot_xiaoqu.generate_constraint_check_plot, (binary_results_directory, figure_directory, i) ) )
run_plot_tasks_parallel(plot_tasks)
## Plot stored binary data
plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory )

View File

@@ -20,11 +20,7 @@ using namespace std;
#include <map.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "misc.h"
#include "macrodef.h"

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#define PI M_PI

View File

@@ -2,11 +2,7 @@
#ifndef BLOCK_H
#define BLOCK_H
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "macrodef.h" //need dim here; Vertex or Cell
#include "var.h"
#include "MyList.h"

View File

@@ -4,11 +4,7 @@
#include <stdarg.h>
#include <string.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "myglobal.h"

File diff suppressed because it is too large Load Diff

View File

@@ -2,11 +2,7 @@
#ifndef PATCH_H
#define PATCH_H
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "MyList.h"
#include "Block.h"
#include "var.h"

View File

@@ -8,11 +8,7 @@
#include <limits.h>
#include <float.h>
#include <math.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "util_Table.h"
#include "cctk.h"

View File

@@ -23,11 +23,8 @@ using namespace std;
#include <complex.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include <memory.h>
#include "MyList.h"
#include "Block.h"
#include "Parallel.h"

View File

@@ -23,11 +23,7 @@ using namespace std;
#include <complex.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "MyList.h"
#include "Block.h"
#include "Parallel.h"

File diff suppressed because it is too large Load Diff

View File

@@ -1,203 +1,235 @@
#ifndef PARALLEL_H
#define PARALLEL_H
#include <iostream>
#include <iomanip>
#include <fstream>
#include <cstdlib>
#include <cstdio>
#include <string>
#include <cmath>
#include <new>
using namespace std;
#include "Parallel_bam.h"
#include "var.h"
#include "MPatch.h"
#include "Block.h"
#include "MyList.h"
#include "macrodef.h" //need dim; ghost_width; CONTRACT
namespace Parallel
{
struct gridseg
{
double llb[dim];
double uub[dim];
int shape[dim];
double illb[dim], iuub[dim]; // only use for OutBdLow2Hi
Block *Bg;
};
int partition1(int &nx, int split_size, int min_width, int cpusize, int shape); // special for 1 diemnsion
int partition2(int *nxy, int split_size, int *min_width, int cpusize, int *shape); // special for 2 diemnsions
int partition3(int *nxyz, int split_size, int *min_width, int cpusize, int *shape);
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfs, bool periodic, int nodes = 0); // produce corresponding Blocks
void KillBlocks(MyList<Patch> *PatchLIST);
void setfunction(MyList<Block> *BlL, var *vn, double func(double x, double y, double z));
void setfunction(int rank, MyList<Block> *BlL, var *vn, double func(double x, double y, double z));
void writefile(double time, int nx, int ny, int nz, double xmin, double xmax, double ymin, double ymax,
double zmin, double zmax, char *filename, double *data_out);
void writefile(double time, int nx, int ny, double xmin, double xmax, double ymin, double ymax,
char *filename, double *datain);
void getarrayindex(int DIM, int *shape, int *index, int n);
int getarraylocation(int DIM, int *shape, int *index);
void copy(int DIM, double *llbout, double *uubout, int *Dshape, double *DD, double *llbin, double *uubin,
int *shape, double *datain, double *llb, double *uub);
void Dump_CPU_Data(MyList<Block> *BlL, MyList<var> *DumpList, char *tag, double time, double dT);
void Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT);
void Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd);
double *Collect_Data(Patch *PP, var *VP);
void d2Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT);
void d2Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd);
void Dump_Data0(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT);
double global_interp(int DIM, int *ext, double **CoX, double *datain,
double *poX, int ordn, double *SoA, int Symmetry);
double global_interp(int DIM, int *ext, double **CoX, double *datain,
double *poX, int ordn);
double Lagrangian_Int(double x, int npts, double *xpts, double *funcvals);
double LagrangePoly(double x, int pt, int npts, double *xpts);
MyList<gridseg> *build_complete_gsl(Patch *Pat);
MyList<gridseg> *build_complete_gsl(MyList<Patch> *PatL);
MyList<gridseg> *build_complete_gsl_virtual(MyList<Patch> *PatL);
MyList<gridseg> *build_complete_gsl_virtual2(MyList<Patch> *PatL); // - buffer
MyList<gridseg> *build_owned_gsl0(Patch *Pat, int rank_in); // - ghost without extension, special for Sync usage
MyList<gridseg> *build_owned_gsl1(Patch *Pat, int rank_in); // - ghost, similar to build_owned_gsl0 but extend one point on left side for vertex grid
MyList<gridseg> *build_owned_gsl2(Patch *Pat, int rank_in); // - buffer - ghost
MyList<gridseg> *build_owned_gsl3(Patch *Pat, int rank_in, int Symmetry); // - ghost - BD ghost
MyList<gridseg> *build_owned_gsl4(Patch *Pat, int rank_in, int Symmetry); // - buffer - ghost - BD ghost
MyList<gridseg> *build_owned_gsl5(Patch *Pat, int rank_in); // similar to build_owned_gsl2 but no extension
MyList<gridseg> *build_owned_gsl(MyList<Patch> *PatL, int rank_in, int type, int Symmetry);
void build_gstl(MyList<gridseg> *srci, MyList<gridseg> *dsti, MyList<gridseg> **out_src, MyList<gridseg> **out_dst);
int data_packer(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir,
MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry);
void transfer(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */,
int Symmetry);
int data_packermix(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir,
MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry);
void transfermix(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */,
int Symmetry);
void Sync(Patch *Pat, MyList<var> *VarList, int Symmetry);
void Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
void Sync_merged(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
struct SyncCache {
bool valid;
int cpusize;
MyList<gridseg> **combined_src;
MyList<gridseg> **combined_dst;
int *send_lengths;
int *recv_lengths;
double **send_bufs;
double **recv_bufs;
int *send_buf_caps;
int *recv_buf_caps;
MPI_Request *reqs;
MPI_Status *stats;
int max_reqs;
SyncCache();
void invalidate();
void destroy();
};
void Sync_cached(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry, SyncCache &cache);
void transfer_cached(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1, MyList<var> *VarList2,
int Symmetry, SyncCache &cache);
struct AsyncSyncState {
int req_no;
bool active;
AsyncSyncState() : req_no(0), active(false) {}
};
void Sync_start(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry,
SyncCache &cache, AsyncSyncState &state);
void Sync_finish(SyncCache &cache, AsyncSyncState &state,
MyList<var> *VarList, int Symmetry);
void OutBdLow2Hi(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void OutBdLow2Hi(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void OutBdLow2Himix(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void OutBdLow2Himix(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Prolong(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Prolongint(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Restrict(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Restrict_after(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry); // for -ghost - BDghost
MyList<Parallel::gridseg> *build_PhysBD_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_ghost_gsl(MyList<Patch> *PatL);
MyList<Parallel::gridseg> *build_ghost_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_buffer_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_buffer_gsl(MyList<Patch> *PatL);
MyList<Parallel::gridseg> *gsl_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gsl_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *clone_gsl(MyList<Parallel::gridseg> *p, bool first_only);
MyList<Parallel::gridseg> *build_bulk_gsl(Patch *Pat); // similar to build_owned_gsl0 but does not care rank issue
MyList<Parallel::gridseg> *build_bulk_gsl(Block *bp, Patch *Pat);
void build_PhysBD_gstl(Patch *Pat, MyList<Parallel::gridseg> *srci, MyList<Parallel::gridseg> *dsti,
MyList<Parallel::gridseg> **out_src, MyList<Parallel::gridseg> **out_dst);
void PeriodicBD(Patch *Pat, MyList<var> *VarList, int Symmetry);
double L2Norm(Patch *Pat, var *vf);
void checkgsl(MyList<Parallel::gridseg> *pp, bool first_only);
void checkvarl(MyList<var> *pp, bool first_only);
MyList<Parallel::gridseg> *divide_gsl(MyList<Parallel::gridseg> *p, Patch *Pat);
MyList<Parallel::gridseg> *divide_gs(MyList<Parallel::gridseg> *p, Patch *Pat);
void prepare_inter_time_level(Patch *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(Patch *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(MyList<Patch> *PatL,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(MyList<Patch> *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex);
void merge_gsl(MyList<gridseg> *&A, const double ratio);
bool merge_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C, const double ratio);
// Add ghost region to tangent plane
// we assume the grids have the same resolution
void add_ghost_touch(MyList<gridseg> *&A);
void cut_gsl(MyList<gridseg> *&A);
bool cut_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C);
MyList<Parallel::gridseg> *gs_subtract_virtual(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
void fill_level_data(MyList<Patch> *PatLd, MyList<Patch> *PatLs, MyList<Patch> *PatcL,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *FutureList,
MyList<var> *tmList, int Symmetry, bool BB, bool CC);
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry);
void aligncheck(double *bbox0, double *bboxl, int lev, double *DH0, int *shape);
bool point_locat_gsl(double *pox, MyList<Parallel::gridseg> *gsl);
void checkpatchlist(MyList<Patch> *PatL, bool buflog);
double L2Norm(Patch *Pat, var *vf, MPI_Comm Comm_here);
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry, MPI_Comm Comm_here);
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfsi,
bool periodic, int start_rank, int end_rank, int nodes = 0);
#endif
}
#endif /*PARALLEL_H */
#ifndef PARALLEL_H
#define PARALLEL_H
#include <iostream>
#include <iomanip>
#include <fstream>
#include <cstdlib>
#include <cstdio>
#include <string>
#include <cmath>
#include <new>
using namespace std;
#include <memory.h>
#include "Parallel_bam.h"
#include "var.h"
#include "MPatch.h"
#include "Block.h"
#include "MyList.h"
#include "macrodef.h" //need dim; ghost_width; CONTRACT
namespace Parallel
{
struct gridseg
{
double llb[dim];
double uub[dim];
int shape[dim];
double illb[dim], iuub[dim]; // only use for OutBdLow2Hi
Block *Bg;
};
int partition1(int &nx, int split_size, int min_width, int cpusize, int shape); // special for 1 diemnsion
int partition2(int *nxy, int split_size, int *min_width, int cpusize, int *shape); // special for 2 diemnsions
int partition3(int *nxyz, int split_size, int *min_width, int cpusize, int *shape);
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfs, bool periodic, int nodes = 0); // produce corresponding Blocks
MyList<Block> *distribute_hard(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfs, bool periodic, int nodes = 0); // produce corresponding Blocks
Block* splitHotspotBlock(MyList<Block>* &BlL, int _dim,
int ib0_orig, int ib3_orig,
int jb1_orig, int jb4_orig,
int kb2_orig, int kb5_orig,
Patch* PP, int r_left, int r_right,
int ingfsi, int fngfsi, bool periodic,
Block* &split_first_block, Block* &split_last_block);
Block* createMappedBlock(MyList<Block>* &BlL, int _dim, int* shape, double* bbox,
int block_id, int ingfsi, int fngfsi, int lev);
void KillBlocks(MyList<Patch> *PatchLIST);
void setfunction(MyList<Block> *BlL, var *vn, double func(double x, double y, double z));
void setfunction(int rank, MyList<Block> *BlL, var *vn, double func(double x, double y, double z));
void writefile(double time, int nx, int ny, int nz, double xmin, double xmax, double ymin, double ymax,
double zmin, double zmax, char *filename, double *data_out);
void writefile(double time, int nx, int ny, double xmin, double xmax, double ymin, double ymax,
char *filename, double *datain);
void getarrayindex(int DIM, int *shape, int *index, int n);
int getarraylocation(int DIM, int *shape, int *index);
void copy(int DIM, double *llbout, double *uubout, int *Dshape, double *DD, double *llbin, double *uubin,
int *shape, double *datain, double *llb, double *uub);
void Dump_CPU_Data(MyList<Block> *BlL, MyList<var> *DumpList, char *tag, double time, double dT);
void Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT);
void Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd);
double *Collect_Data(Patch *PP, var *VP);
void d2Dump_Data(MyList<Patch> *PL, MyList<var> *DumpList, char *tag, double time, double dT);
void d2Dump_Data(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT, int grd);
void Dump_Data0(Patch *PP, MyList<var> *DumpList, char *tag, double time, double dT);
double global_interp(int DIM, int *ext, double **CoX, double *datain,
double *poX, int ordn, double *SoA, int Symmetry);
double global_interp(int DIM, int *ext, double **CoX, double *datain,
double *poX, int ordn);
double Lagrangian_Int(double x, int npts, double *xpts, double *funcvals);
double LagrangePoly(double x, int pt, int npts, double *xpts);
MyList<gridseg> *build_complete_gsl(Patch *Pat);
MyList<gridseg> *build_complete_gsl(MyList<Patch> *PatL);
MyList<gridseg> *build_complete_gsl_virtual(MyList<Patch> *PatL);
MyList<gridseg> *build_complete_gsl_virtual2(MyList<Patch> *PatL); // - buffer
MyList<gridseg> *build_owned_gsl0(Patch *Pat, int rank_in); // - ghost without extension, special for Sync usage
MyList<gridseg> *build_owned_gsl1(Patch *Pat, int rank_in); // - ghost, similar to build_owned_gsl0 but extend one point on left side for vertex grid
MyList<gridseg> *build_owned_gsl2(Patch *Pat, int rank_in); // - buffer - ghost
MyList<gridseg> *build_owned_gsl3(Patch *Pat, int rank_in, int Symmetry); // - ghost - BD ghost
MyList<gridseg> *build_owned_gsl4(Patch *Pat, int rank_in, int Symmetry); // - buffer - ghost - BD ghost
MyList<gridseg> *build_owned_gsl5(Patch *Pat, int rank_in); // similar to build_owned_gsl2 but no extension
MyList<gridseg> *build_owned_gsl(MyList<Patch> *PatL, int rank_in, int type, int Symmetry);
void build_gstl(MyList<gridseg> *srci, MyList<gridseg> *dsti, MyList<gridseg> **out_src, MyList<gridseg> **out_dst);
int data_packer(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir,
MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry);
void transfer(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */,
int Symmetry);
int data_packermix(double *data, MyList<gridseg> *src, MyList<gridseg> *dst, int rank_in, int dir,
MyList<var> *VarLists, MyList<var> *VarListd, int Symmetry);
void transfermix(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /*target */,
int Symmetry);
void Sync(Patch *Pat, MyList<var> *VarList, int Symmetry);
void Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
void Sync_merged(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
struct SyncCache {
bool valid;
int cpusize;
MyList<gridseg> **combined_src;
MyList<gridseg> **combined_dst;
int *send_lengths;
int *recv_lengths;
double **send_bufs;
double **recv_bufs;
int *send_buf_caps;
int *recv_buf_caps;
MPI_Request *reqs;
MPI_Status *stats;
int max_reqs;
bool lengths_valid;
SyncCache();
void invalidate();
void destroy();
};
void Sync_cached(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry, SyncCache &cache);
void transfer_cached(MyList<gridseg> **src, MyList<gridseg> **dst,
MyList<var> *VarList1, MyList<var> *VarList2,
int Symmetry, SyncCache &cache);
struct AsyncSyncState {
int req_no;
bool active;
AsyncSyncState() : req_no(0), active(false) {}
};
void Sync_start(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry,
SyncCache &cache, AsyncSyncState &state);
void Sync_finish(SyncCache &cache, AsyncSyncState &state,
MyList<var> *VarList, int Symmetry);
void OutBdLow2Hi(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void OutBdLow2Hi(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void OutBdLow2Himix(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void OutBdLow2Himix(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Restrict_cached(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1, MyList<var> *VarList2,
int Symmetry, SyncCache &cache);
void OutBdLow2Hi_cached(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1, MyList<var> *VarList2,
int Symmetry, SyncCache &cache);
void OutBdLow2Himix_cached(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1, MyList<var> *VarList2,
int Symmetry, SyncCache &cache);
void Prolong(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Prolongint(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Restrict(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry);
void Restrict_after(MyList<Patch> *PatcL, MyList<Patch> *PatfL,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry); // for -ghost - BDghost
MyList<Parallel::gridseg> *build_PhysBD_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_ghost_gsl(MyList<Patch> *PatL);
MyList<Parallel::gridseg> *build_ghost_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_buffer_gsl(Patch *Pat);
MyList<Parallel::gridseg> *build_buffer_gsl(MyList<Patch> *PatL);
MyList<Parallel::gridseg> *gsl_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_subtract(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gsl_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *gs_and(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
MyList<Parallel::gridseg> *clone_gsl(MyList<Parallel::gridseg> *p, bool first_only);
MyList<Parallel::gridseg> *build_bulk_gsl(Patch *Pat); // similar to build_owned_gsl0 but does not care rank issue
MyList<Parallel::gridseg> *build_bulk_gsl(Block *bp, Patch *Pat);
void build_PhysBD_gstl(Patch *Pat, MyList<Parallel::gridseg> *srci, MyList<Parallel::gridseg> *dsti,
MyList<Parallel::gridseg> **out_src, MyList<Parallel::gridseg> **out_dst);
void PeriodicBD(Patch *Pat, MyList<var> *VarList, int Symmetry);
double L2Norm(Patch *Pat, var *vf);
void checkgsl(MyList<Parallel::gridseg> *pp, bool first_only);
void checkvarl(MyList<var> *pp, bool first_only);
MyList<Parallel::gridseg> *divide_gsl(MyList<Parallel::gridseg> *p, Patch *Pat);
MyList<Parallel::gridseg> *divide_gs(MyList<Parallel::gridseg> *p, Patch *Pat);
void prepare_inter_time_level(Patch *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(Patch *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(MyList<Patch> *PatL,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* target (t+a*dt) */, int tindex);
void prepare_inter_time_level(MyList<Patch> *Pat,
MyList<var> *VarList1 /* source (t+dt) */, MyList<var> *VarList2 /* source (t) */,
MyList<var> *VarList3 /* source (t-dt) */, MyList<var> *VarList4 /* target (t+a*dt) */, int tindex);
void merge_gsl(MyList<gridseg> *&A, const double ratio);
bool merge_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C, const double ratio);
// Add ghost region to tangent plane
// we assume the grids have the same resolution
void add_ghost_touch(MyList<gridseg> *&A);
void cut_gsl(MyList<gridseg> *&A);
bool cut_gs(MyList<gridseg> *D, MyList<gridseg> *B, MyList<gridseg> *&C);
MyList<Parallel::gridseg> *gs_subtract_virtual(MyList<Parallel::gridseg> *A, MyList<Parallel::gridseg> *B);
void fill_level_data(MyList<Patch> *PatLd, MyList<Patch> *PatLs, MyList<Patch> *PatcL,
MyList<var> *OldList, MyList<var> *StateList, MyList<var> *FutureList,
MyList<var> *tmList, int Symmetry, bool BB, bool CC);
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry);
void aligncheck(double *bbox0, double *bboxl, int lev, double *DH0, int *shape);
bool point_locat_gsl(double *pox, MyList<Parallel::gridseg> *gsl);
void checkpatchlist(MyList<Patch> *PatL, bool buflog);
double L2Norm(Patch *Pat, var *vf, MPI_Comm Comm_here);
bool PatList_Interp_Points(MyList<Patch> *PatL, MyList<var> *VarList,
int NN, double **XX,
double *Shellf, int Symmetry, MPI_Comm Comm_here);
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
MyList<Block> *distribute(MyList<Patch> *PatchLIST, int cpusize, int ingfsi, int fngfsi,
bool periodic, int start_rank, int end_rank, int nodes = 0);
// Redistribute blocks with time statistics for load balancing
MyList<Block> *distribute(MyList<Patch> *PatchLIST, MyList<Block> *OldBlockL,
int cpusize, int ingfsi, int fngfsi,
bool periodic, int start_rank, int end_rank, int nodes = 0);
#endif
// Dynamic load balancing: split blocks for heavy ranks
void split_heavy_blocks(MyList<Patch> *PatL, int *heavy_ranks, int num_heavy,
int split_factor, int cpusize, int ingfsi, int fngfsi);
// Check if load balancing is needed based on interpolation times
bool check_load_balance_need(double *rank_times, int nprocs, int &num_heavy, int *heavy_ranks);
}
#endif /*PARALLEL_H */

View File

@@ -2,11 +2,7 @@
#ifndef SHELLPATCH_H
#define SHELLPATCH_H
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "MyList.h"
#include "Block.h"
#include "Parallel.h"

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "cgh.h"
#include "ShellPatch.h"

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "cgh.h"
#include "ShellPatch.h"

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "cgh.h"
#include "ShellPatch.h"

View File

@@ -5819,21 +5819,11 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, GH->bdsul[lev], Symmetry);
@@ -5880,21 +5870,11 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SL, SL, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SL, SL, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->bdsul[lev], Symmetry);
@@ -5969,21 +5949,11 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
Parallel::Sync_cached(GH->PatL[lev - 1], SynchList_pre, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SL, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, GH->bdsul[lev], Symmetry);
@@ -6001,21 +5971,11 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
Parallel::Sync_cached(GH->PatL[lev - 1], SL, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SL, SL, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SL, SL, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->bdsul[lev], Symmetry);
@@ -6076,21 +6036,11 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
Parallel::Sync_cached(GH->PatL[lev - 1], SynchList_pre, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, GH->bdsul[lev], Symmetry);
@@ -6110,21 +6060,11 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
Parallel::Sync_cached(GH->PatL[lev - 1], StateList, Symmetry, sync_cache_rp_coarse[lev]);
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],StateList,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, GH->bdsul[lev], Symmetry);
@@ -6161,21 +6101,11 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
}
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, SynchList_pre, SynchList_cor, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_pre,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, GH->bdsul[lev], Symmetry);
@@ -6184,21 +6114,11 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
else // no time refinement levels and for all same time levels
{
#if (RPB == 0)
Ppc = GH->PatL[lev - 1];
while (Ppc)
{
Pp = GH->PatL[lev];
while (Pp)
{
#if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry);
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(Ppc->data, Pp->data, StateList, SynchList_cor, Symmetry);
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif
Pp = Pp->next;
}
Ppc = Ppc->next;
}
#elif (RPB == 1)
// Parallel::OutBdLow2Hi_bam(GH->PatL[lev-1],GH->PatL[lev],StateList,SynchList_cor,Symmetry);
Parallel::OutBdLow2Hi_bam(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, GH->bdsul[lev], Symmetry);

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "macrodef.h"
#include "cgh.h"

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "macrodef.h"
#include "cgh.h"

View File

@@ -945,103 +945,60 @@
SSA(2)=SYM
SSA(3)=ANTI
!!!!!!!!!advection term part
!!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency)
! lopsided_kodis shares the symmetry_bd buffer between advection and
! dissipation, eliminating redundant full-grid copies. For metric variables
! gxx/gyy/gzz (=dxx/dyy/dzz+1): kodis stencil coefficients sum to zero,
! so the constant offset has no effect on dissipation.
call lopsided(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS)
call lopsided(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA)
call lopsided(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA)
call lopsided(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided_kodis(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided_kodis(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided_kodis(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS)
call lopsided(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA)
call lopsided(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA)
call lopsided(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided_kodis(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided_kodis(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided_kodis(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS)
call lopsided_kodis(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA)
!!
call lopsided_kodis(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided_kodis(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided_kodis(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
#if 1
!! bam does not apply dissipation on gauge variables
call lopsided_kodis(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS,eps)
#if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided_kodis(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided_kodis(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided_kodis(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
#endif
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided_kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided_kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided_kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
#endif
#else
! No dissipation on gauge variables (advection only)
call lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS)
#if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA)
#endif
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA)
#endif
if(eps>0)then
! usual Kreiss-Oliger dissipation
call kodis(ex,X,Y,Z,chi,chi_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,trK,trK_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,dxx,gxx_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,gxy,gxy_rhs,AAS,Symmetry,eps)
call kodis(ex,X,Y,Z,gxz,gxz_rhs,ASA,Symmetry,eps)
call kodis(ex,X,Y,Z,dyy,gyy_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,gyz,gyz_rhs,SAA,Symmetry,eps)
call kodis(ex,X,Y,Z,dzz,gzz_rhs,SSS,Symmetry,eps)
#if 0
#define i 42
#define j 40
#define k 40
if(Lev == 1)then
write(*,*) X(i),Y(j),Z(k)
write(*,*) "before",Axx_rhs(i,j,k)
endif
#undef i
#undef j
#undef k
!!stop
#endif
call kodis(ex,X,Y,Z,Axx,Axx_rhs,SSS,Symmetry,eps)
#if 0
#define i 42
#define j 40
#define k 40
if(Lev == 1)then
write(*,*) X(i),Y(j),Z(k)
write(*,*) "after",Axx_rhs(i,j,k)
endif
#undef i
#undef j
#undef k
!!stop
#endif
call kodis(ex,X,Y,Z,Axy,Axy_rhs,AAS,Symmetry,eps)
call kodis(ex,X,Y,Z,Axz,Axz_rhs,ASA,Symmetry,eps)
call kodis(ex,X,Y,Z,Ayy,Ayy_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,Ayz,Ayz_rhs,SAA,Symmetry,eps)
call kodis(ex,X,Y,Z,Azz,Azz_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,Gamx,Gamx_rhs,ASS,Symmetry,eps)
call kodis(ex,X,Y,Z,Gamy,Gamy_rhs,SAS,Symmetry,eps)
call kodis(ex,X,Y,Z,Gamz,Gamz_rhs,SSA,Symmetry,eps)
#if 1
!! bam does not apply dissipation on gauge variables
call kodis(ex,X,Y,Z,Lap,Lap_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,betax,betax_rhs,ASS,Symmetry,eps)
call kodis(ex,X,Y,Z,betay,betay_rhs,SAS,Symmetry,eps)
call kodis(ex,X,Y,Z,betaz,betaz_rhs,SSA,Symmetry,eps)
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,ASS,Symmetry,eps)
call kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,SAS,Symmetry,eps)
call kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,SSA,Symmetry,eps)
#endif
#endif
endif
if(co == 0)then
! ham_Res = trR + 2/3 * K^2 - A_ij * A^ij - 16 * PI * rho

File diff suppressed because it is too large Load Diff

View File

@@ -1,96 +1,107 @@
#ifndef CGH_H
#define CGH_H
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "MyList.h"
#include "MPatch.h"
#include "macrodef.h"
#include "monitor.h"
#include "Parallel.h"
class cgh
{
public:
int levels, movls, BH_num_in;
// information of boxes
int *grids;
double ***bbox;
int ***shape;
double ***handle;
double ***Porgls;
double *Lt;
// information of Patch list
MyList<Patch> **PatL;
// information of OutBdLow2Hi point list and Restrict point list
#if (RPB == 1)
MyList<Parallel::pointstru_bam> **bdsul, **rsul;
#endif
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
int mylev;
int *start_rank, *end_rank;
MPI_Comm *Commlev;
#endif
protected:
int ingfs, fngfs;
static constexpr double ratio = 0.75;
int trfls;
public:
cgh(int ingfsi, int fngfsi, int Symmetry, char *filename, int checkrun, monitor *ErrorMonitor);
~cgh();
void compose_cgh(int nprocs);
void sethandle(monitor *ErrorMonitor);
void checkPatchList(MyList<Patch> *PatL, bool buflog);
void Regrid(int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void Regrid_fake(int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void recompose_cgh(int nprocs, bool *lev_flag,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
void recompose_cgh_fake(int nprocs, bool *lev_flag,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
void read_bbox(int Symmetry, char *filename);
MyList<Patch> *construct_patchlist(int lev, int Symmetry);
bool Interp_One_Point(MyList<var> *VarList,
double *XX, /*input global Cartesian coordinate*/
double *Shellf, int Symmetry);
void recompose_cgh_Onelevel(int nprocs, int lev,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
void Regrid_Onelevel(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void Regrid_Onelevel_aux(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void settrfls(const int lev);
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
void construct_mylev(int nprocs);
#endif
};
#endif /* CGH_H */
#ifndef CGH_H
#define CGH_H
#include <mpi.h>
#include "MyList.h"
#include "MPatch.h"
#include "macrodef.h"
#include "monitor.h"
#include "Parallel.h"
class cgh
{
public:
int levels, movls, BH_num_in;
// information of boxes
int *grids;
double ***bbox;
int ***shape;
double ***handle;
double ***Porgls;
double *Lt;
// information of Patch list
MyList<Patch> **PatL;
// information of OutBdLow2Hi point list and Restrict point list
#if (RPB == 1)
MyList<Parallel::pointstru_bam> **bdsul, **rsul;
#endif
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
int mylev;
int *start_rank, *end_rank;
MPI_Comm *Commlev;
#endif
protected:
int ingfs, fngfs;
static constexpr double ratio = 0.75;
int trfls;
public:
cgh(int ingfsi, int fngfsi, int Symmetry, char *filename, int checkrun, monitor *ErrorMonitor);
~cgh();
void compose_cgh(int nprocs);
void sethandle(monitor *ErrorMonitor);
void checkPatchList(MyList<Patch> *PatL, bool buflog);
void Regrid(int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void Regrid_fake(int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void recompose_cgh(int nprocs, bool *lev_flag,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
void recompose_cgh_fake(int nprocs, bool *lev_flag,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
void read_bbox(int Symmetry, char *filename);
MyList<Patch> *construct_patchlist(int lev, int Symmetry);
bool Interp_One_Point(MyList<var> *VarList,
double *XX, /*input global Cartesian coordinate*/
double *Shellf, int Symmetry);
void recompose_cgh_Onelevel(int nprocs, int lev,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
void Regrid_Onelevel(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void Regrid_Onelevel_aux(int lev, int Symmetry, int BH_num, double **Porgbr, double **Porg0,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList, bool BB,
monitor *ErrorMonitor);
void settrfls(const int lev);
#if (PSTR == 1 || PSTR == 2 || PSTR == 3)
void construct_mylev(int nprocs);
#endif
// Load balancing support
bool enable_load_balance; // Enable load balancing
int load_balance_check_interval; // Check interval (in time steps)
int current_time_step; // Current time step counter
double *rank_interp_times; // Store interpolation times for each rank
int *heavy_ranks; // Store heavy rank numbers
int num_heavy_ranks; // Number of heavy ranks
void init_load_balance(int nprocs);
void update_interp_time(int rank, double time);
bool check_and_rebalance(int nprocs, int lev,
MyList<var> *OldList, MyList<var> *StateList,
MyList<var> *FutureList, MyList<var> *tmList,
int Symmetry, bool BB);
};
#endif /* CGH_H */

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <time.h>
#include <stdlib.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "var.h"
#include "MyList.h"

View File

@@ -69,11 +69,12 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
!DIR$ UNROLL PARTIAL(4)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
! x direction
if(i+1 <= imax .and. i-1 >= imin)then
!
! - f(i-1) + f(i+1)
@@ -152,7 +153,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -220,7 +220,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -285,7 +284,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -375,7 +373,8 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
!DIR$ UNROLL PARTIAL(4)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -474,7 +473,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -537,7 +535,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -601,7 +598,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -665,7 +661,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -728,7 +723,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -790,7 +784,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -877,7 +870,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1009,7 +1001,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1164,7 +1155,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1241,7 +1231,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1312,7 +1301,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1417,7 +1405,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1593,7 +1580,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1661,7 +1647,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1731,7 +1716,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1801,7 +1785,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1872,7 +1855,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -1941,7 +1923,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2034,7 +2015,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2151,7 +2131,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2237,7 +2216,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2314,7 +2292,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2433,7 +2410,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2621,7 +2597,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2694,7 +2669,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2770,7 +2744,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2846,7 +2819,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -2927,7 +2899,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3006,7 +2977,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3114,7 +3084,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3251,7 +3220,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3347,7 +3315,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3432,7 +3399,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3568,7 +3534,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3841,7 +3806,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -3923,7 +3887,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -4008,7 +3971,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -4093,7 +4055,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -4196,7 +4157,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
@@ -4297,7 +4257,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1

View File

@@ -81,7 +81,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -180,7 +179,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -264,7 +262,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -345,7 +342,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -447,7 +443,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -558,7 +553,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -633,7 +627,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -709,7 +702,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -785,7 +777,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -860,7 +851,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -934,7 +924,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1030,7 +1019,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1146,7 +1134,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1240,7 +1227,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1328,7 +1314,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1445,7 +1430,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1596,7 +1580,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1676,7 +1659,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1758,7 +1740,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1840,7 +1821,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1923,7 +1903,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2004,7 +1983,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2109,7 +2087,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2242,7 +2219,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2345,7 +2321,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2439,7 +2414,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2570,7 +2544,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2770,7 +2743,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2855,7 +2827,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -2943,7 +2914,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3031,7 +3001,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3124,7 +3093,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3215,7 +3183,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3335,7 +3302,6 @@
fy = ZEO
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3488,7 +3454,6 @@
fx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3601,7 +3566,6 @@
fy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3703,7 +3667,6 @@
fz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -3851,7 +3814,6 @@
fxz = ZEO
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4136,7 +4098,6 @@
fxx = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4230,7 +4191,6 @@
fyy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4327,7 +4287,6 @@
fzz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4424,7 +4383,6 @@
fxy = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4539,7 +4497,6 @@
fxz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4652,7 +4609,6 @@
fyz = ZEO
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4723,7 +4679,6 @@ subroutine fderivs_shc(ex,f,fx,fy,fz,crho,sigma,R,SYM1,SYM2,SYM3,Symmetry,Lev,ss
#if 0
integer :: i,j,k
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -4774,7 +4729,6 @@ subroutine fdderivs_shc(ex,f,fxx,fxy,fxz,fyy,fyz,fzz,crho,sigma,R,SYM1,SYM2,SYM3
#if 0
integer :: i,j,k
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)

View File

@@ -27,7 +27,6 @@
!~~~~~~>
!$omp parallel do collapse(2) private(i,j,k,lgxx,lgyy,lgzz,ldetg,lgupxx,lgupxy,lgupxz,lgupyy,lgupyz,lgupzz,ltrA,lscale)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -105,7 +104,6 @@
!~~~~~~>
!$omp parallel do collapse(2) private(i,j,k,lgxx,lgyy,lgzz,lscale,lgxy,lgxz,lgyz,lgupxx,lgupxy,lgupxz,lgupyy,lgupyz,lgupzz,ltrA)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)

View File

@@ -6,11 +6,7 @@
#include <stdio.h>
#include <assert.h>
#include <math.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "util_Table.h"
#include "cctk.h"

View File

@@ -6,11 +6,7 @@
#include <stdio.h>
#include <assert.h>
#include <math.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "cctk.h"

View File

@@ -883,13 +883,17 @@ subroutine symmetry_bd(ord,extc,func,funcc,SoA)
integer::i
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
funcc(1:extc(1),1:extc(2),1:extc(3)) = func
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1)
enddo
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
do i=0,ord-1
funcc(:,-i,1:extc(3)) = funcc(:,i+1,1:extc(3))*SoA(2)
enddo
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
do i=0,ord-1
funcc(:,:,-i) = funcc(:,:,i+1)*SoA(3)
enddo
@@ -1112,6 +1116,7 @@ end subroutine d2dump
! Lagrangian polynomial interpolation
!------------------------------------------------------------------------------
!DIR$ ATTRIBUTES FORCEINLINE :: polint
subroutine polint(xa, ya, x, y, dy, ordn)
implicit none

View File

@@ -65,8 +65,9 @@ real*8,intent(in) :: eps
! dx^4
! note the sign (-1)^r-1, now r=2
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
!DIR$ SIMD VECTORLENGTHFOR(KNOWN_INTEGER=8)
!DIR$ UNROLL PARTIAL(4)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -160,8 +161,7 @@ integer, parameter :: NO_SYMM=0, OCTANT=2
call symmetry_bd(3,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -275,8 +275,7 @@ real*8,intent(in) :: eps
! dx^8
! note the sign (-1)^r-1, now r=4
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -388,8 +387,7 @@ real*8,intent(in) :: eps
! dx^10
! note the sign (-1)^r-1, now r=5
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)

View File

@@ -80,8 +80,7 @@ real*8,intent(in) :: eps
! dx^4
! note the sign (-1)^r-1, now r=2
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -179,8 +178,7 @@ real*8,intent(in) :: eps
! dx^4
! note the sign (-1)^r-1, now r=2
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -275,8 +273,7 @@ integer, parameter :: NO_SYMM=0, EQ_SYMM=1, OCTANT=2
call symmetry_stbd(2,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -372,8 +369,7 @@ integer, parameter :: NO_SYMM=0, EQ_SYMM=1, OCTANT=2
call symmetry_stbd(3,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -514,8 +510,7 @@ integer, parameter :: NO_SYMM=0, EQ_SYMM=1, OCTANT=2
call symmetry_stbd(3,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -603,8 +598,7 @@ integer, parameter :: NO_SYMM=0, EQ_SYMM=1, OCTANT=2
call symmetry_stbd(3,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -700,8 +694,7 @@ real*8,intent(in) :: eps
! dx^8
! note the sign (-1)^r-1, now r=4
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -801,8 +794,7 @@ integer, parameter :: NO_SYMM=0, EQ_SYMM=1, OCTANT=2
call symmetry_stbd(4,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -911,8 +903,7 @@ real*8,intent(in) :: eps
! dx^10
! note the sign (-1)^r-1, now r=5
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
@@ -1015,8 +1006,7 @@ integer, parameter :: NO_SYMM=0, EQ_SYMM=1, OCTANT=2
call symmetry_stbd(5,ex,f,fh,SoA)
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)

View File

@@ -68,8 +68,7 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
@@ -234,8 +233,7 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
#if 0
@@ -489,6 +487,201 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
end subroutine lopsided
!-----------------------------------------------------------------------------
! Combined advection (lopsided) + Kreiss-Oliger dissipation (kodis)
! Shares the symmetry_bd buffer fh, eliminating one full-grid copy per call.
! Mathematically identical to calling lopsided then kodis separately.
!-----------------------------------------------------------------------------
subroutine lopsided_kodis(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA,eps)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
real*8,intent(in) :: eps
!~~~~~~> local variables:
! note index -2,-1,0, so we have 3 extra points
real*8,dimension(-2:ex(1),-2:ex(2),-2:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F3=3.d0
real*8, parameter :: TWO=2.d0,F6=6.0d0,F18=1.8d1
real*8, parameter :: F12=1.2d1, F10=1.d1,EIT=8.d0
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
! kodis parameters
real*8, parameter :: SIX=6.d0,FIT=1.5d1,TWT=2.d1
real*8, parameter :: cof=6.4d1 ! 2^6
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -2
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -2
! Single symmetry_bd call shared by both advection and dissipation
call symmetry_bd(3,ex,f,fh,SoA)
! ---- Advection (lopsided) loop ----
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
if(Sfx(i,j,k) > ZEO)then
if(i+3 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
elseif(i+2 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
endif
elseif(Sfx(i,j,k) < ZEO)then
if(i-3 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
elseif(i-2 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i-1 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
endif
endif
! y direction
if(Sfy(i,j,k) > ZEO)then
if(j+3 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
elseif(j+2 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
endif
elseif(Sfy(i,j,k) < ZEO)then
if(j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
elseif(j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
endif
endif
! z direction
if(Sfz(i,j,k) > ZEO)then
if(k+3 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
elseif(k+2 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
endif
elseif(Sfz(i,j,k) < ZEO)then
if(k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
elseif(k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
endif
endif
enddo
enddo
enddo
! ---- Dissipation (kodis) loop ----
if(eps > ZEO) then
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i-3 >= imin .and. i+3 <= imax .and. &
j-3 >= jmin .and. j+3 <= jmax .and. &
k-3 >= kmin .and. k+3 <= kmax) then
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )/dX + &
( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )/dY + &
( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )/dZ )
endif
enddo
enddo
enddo
endif
return
end subroutine lopsided_kodis
#elif (ghost_width == 4)
! sixth order code
! Compute advection terms in right hand sides of field equations
@@ -560,8 +753,7 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
@@ -777,8 +969,7 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
!$omp parallel do collapse(2) private(i,j,k)
do k=1,ex(3)-1
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction

View File

@@ -95,8 +95,8 @@ $(CUDAFILES): bssn_gpu.h gpu_mem.h gpu_rhsSS_mem.h
misc.o : zbesh.o
# projects
ABE: $(C++FILES) $(F90FILES) $(F77FILES) $(AHFDOBJS)
$(CLINKER) $(CXXAPPFLAGS) -qopenmp -o $@ $(C++FILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(LDLIBS)
ABE: $(C++FILES) $(F90FILES) $(F77FILES) $(AHFDOBJS)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(LDLIBS)
ABEGPU: $(C++FILES_GPU) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES_GPU) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES) $(LDLIBS)

View File

@@ -6,23 +6,25 @@
## Intel oneAPI version with oneMKL (Optimized for performance)
filein = -I/usr/include/ -I${MKLROOT}/include
## Using OpenMP-threaded MKL for parallel performance
## Using sequential MKL (OpenMP disabled for better single-threaded performance)
## Added -lifcore for Intel Fortran runtime and -limf for Intel math library
LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lifcore -limf -lpthread -lm -ldl
LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore -limf -lpthread -lm -ldl
## Aggressive optimization flags + PGO Phase 2 (profile-guided optimization)
## -fprofile-instr-use: use collected profile data to guide optimization decisions
## (branch prediction, basic block layout, inlining, loop unrolling)
PROFDATA = /home/amss/AMSS-NCKU/pgo_profile/default.profdata
CXXAPPFLAGS = -O3 -march=native -fp-model fast=2 -fma -ipo -qopenmp \
-DMPI_STUB -Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -march=native -fp-model fast=2 -fma -ipo -qopenmp \
PROFDATA = ../../pgo_profile/default.profdata
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=$(PROFDATA) \
-Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=$(PROFDATA) \
-align array64byte -fpp -I${MKLROOT}/include
f90 = ifx
f77 = ifx
CXX = icpx
CC = icx
CLINKER = icpx
CLINKER = mpiicpx
Cu = nvcc
CUDA_LIB_PATH = -L/usr/lib/cuda/lib64 -I/usr/include -I/usr/lib/cuda/include

View File

@@ -14,11 +14,7 @@ using namespace std;
#include <string.h>
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "misc.h"
#include "macrodef.h"

View File

@@ -24,11 +24,7 @@ using namespace std;
#include <complex.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
namespace misc
{

View File

@@ -20,11 +20,7 @@ using namespace std;
#endif
#include <time.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
class monitor
{

View File

@@ -1,153 +0,0 @@
#ifndef MPI_STUB_H
#define MPI_STUB_H
/*
* MPI Stub Header — single-process shim for AMSS-NCKU ABE solver.
* Provides all MPI types, constants, and functions used in the codebase
* as no-ops or trivial implementations for nprocs=1, myrank=0.
*/
#include <cstring>
#include <cstdlib>
#include <cstdio>
#include <time.h>
/* ── Types ─────────────────────────────────────────────────────────── */
typedef int MPI_Comm;
typedef int MPI_Datatype;
typedef int MPI_Op;
typedef int MPI_Request;
typedef int MPI_Group;
typedef struct MPI_Status {
int MPI_SOURCE;
int MPI_TAG;
int MPI_ERROR;
} MPI_Status;
/* ── Constants ─────────────────────────────────────────────────────── */
#define MPI_COMM_WORLD 0
#define MPI_INT 1
#define MPI_DOUBLE 2
#define MPI_DOUBLE_PRECISION 2
#define MPI_DOUBLE_INT 3
#define MPI_SUM 1
#define MPI_MAX 2
#define MPI_MAXLOC 3
#define MPI_STATUS_IGNORE ((MPI_Status *)0)
#define MPI_STATUSES_IGNORE ((MPI_Status *)0)
#define MPI_MAX_PROCESSOR_NAME 256
/* ── Helper: sizeof for MPI_Datatype ──────────────────────────────── */
static inline size_t mpi_stub_sizeof(MPI_Datatype type) {
switch (type) {
case MPI_INT: return sizeof(int);
case MPI_DOUBLE: return sizeof(double);
case MPI_DOUBLE_INT: return sizeof(double) + sizeof(int);
default: return 0;
}
}
/* ── Init / Finalize ──────────────────────────────────────────────── */
static inline int MPI_Init(int *, char ***) { return 0; }
static inline int MPI_Finalize() { return 0; }
/* ── Communicator queries ─────────────────────────────────────────── */
static inline int MPI_Comm_rank(MPI_Comm, int *rank) { *rank = 0; return 0; }
static inline int MPI_Comm_size(MPI_Comm, int *size) { *size = 1; return 0; }
static inline int MPI_Comm_split(MPI_Comm comm, int, int, MPI_Comm *newcomm) {
*newcomm = comm;
return 0;
}
static inline int MPI_Comm_free(MPI_Comm *) { return 0; }
/* ── Group operations ─────────────────────────────────────────────── */
static inline int MPI_Comm_group(MPI_Comm, MPI_Group *group) {
*group = 0;
return 0;
}
static inline int MPI_Group_translate_ranks(MPI_Group, int n,
const int *ranks1, MPI_Group, int *ranks2) {
for (int i = 0; i < n; ++i) ranks2[i] = ranks1[i];
return 0;
}
static inline int MPI_Group_free(MPI_Group *) { return 0; }
/* ── Collective operations ────────────────────────────────────────── */
static inline int MPI_Allreduce(const void *sendbuf, void *recvbuf,
int count, MPI_Datatype datatype, MPI_Op, MPI_Comm) {
std::memcpy(recvbuf, sendbuf, count * mpi_stub_sizeof(datatype));
return 0;
}
static inline int MPI_Iallreduce(const void *sendbuf, void *recvbuf,
int count, MPI_Datatype datatype, MPI_Op, MPI_Comm,
MPI_Request *request) {
std::memcpy(recvbuf, sendbuf, count * mpi_stub_sizeof(datatype));
*request = 0;
return 0;
}
static inline int MPI_Bcast(void *, int, MPI_Datatype, int, MPI_Comm) {
return 0;
}
static inline int MPI_Barrier(MPI_Comm) { return 0; }
/* ── Point-to-point (never reached with nprocs=1) ─────────────────── */
static inline int MPI_Send(const void *, int, MPI_Datatype, int, int, MPI_Comm) {
return 0;
}
static inline int MPI_Recv(void *, int, MPI_Datatype, int, int, MPI_Comm, MPI_Status *) {
return 0;
}
static inline int MPI_Isend(const void *, int, MPI_Datatype, int, int, MPI_Comm,
MPI_Request *req) {
*req = 0;
return 0;
}
static inline int MPI_Irecv(void *, int, MPI_Datatype, int, int, MPI_Comm,
MPI_Request *req) {
*req = 0;
return 0;
}
/* ── Completion ───────────────────────────────────────────────────── */
static inline int MPI_Wait(MPI_Request *, MPI_Status *) { return 0; }
static inline int MPI_Waitall(int, MPI_Request *, MPI_Status *) { return 0; }
/* ── Utility ──────────────────────────────────────────────────────── */
static inline int MPI_Abort(MPI_Comm, int error_code) {
std::fprintf(stderr, "MPI_Abort called with error code %d\n", error_code);
std::exit(error_code);
return 0;
}
static inline double MPI_Wtime() {
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return (double)ts.tv_sec + (double)ts.tv_nsec * 1.0e-9;
}
static inline int MPI_Get_processor_name(char *name, int *resultlen) {
const char *stub_name = "localhost";
std::strcpy(name, stub_name);
*resultlen = (int)std::strlen(stub_name);
return 0;
}
#endif /* MPI_STUB_H */

View File

@@ -24,11 +24,7 @@ using namespace std;
#include <map.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
namespace parameters
{

View File

@@ -30,11 +30,7 @@ using namespace std;
#include <sys/time.h>
#include <sys/resource.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
/* Real time */
#define TimerSignal SIGALRM

View File

@@ -109,33 +109,23 @@
if( RK4 == 0 ) then
!$omp parallel workshare
f1 = f0 + HLF * dT * f_rhs
!$omp end parallel workshare
elseif(RK4 == 1 ) then
!$omp parallel workshare
f_rhs = f_rhs + TWO * f1
!$omp end parallel workshare
!$omp parallel workshare
f1 = f0 + HLF * dT * f1
!$omp end parallel workshare
elseif(RK4 == 2 ) then
!$omp parallel workshare
f_rhs = f_rhs + TWO * f1
!$omp end parallel workshare
!$omp parallel workshare
f1 = f0 + dT * f1
!$omp end parallel workshare
elseif( RK4 == 3 ) then
!$omp parallel workshare
f1 = f0 +F1o6 * dT *(f1 + f_rhs)
!$omp end parallel workshare
else
@@ -144,7 +134,7 @@
endif
return
return
end subroutine rungekutta4_rout
!-----------------------------------------------------------------------------
@@ -225,19 +215,15 @@
if( RK4 == 0 ) then
!$omp parallel workshare
f1 = f0 + dT * f_rhs
!$omp end parallel workshare
else
!$omp parallel workshare
f1 = f0 + HLF * dT * (f1+f_rhs)
!$omp end parallel workshare
endif
return
return
end subroutine icn_rout
!~~~~~~~~~~~~~~~~~~
@@ -253,10 +239,8 @@
real*8, dimension(ex(1),ex(2),ex(3)),intent(in) ::f_rhs
real*8, dimension(ex(1),ex(2),ex(3)),intent(out) ::f1
!$omp parallel workshare
f1 = f0 + dT * f_rhs
!$omp end parallel workshare
return
return
end subroutine euler_rout

View File

@@ -19,11 +19,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "cgh.h"
#include "ShellPatch.h"

View File

@@ -18,11 +18,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "misc.h"
#include "microdef.h"

View File

@@ -3,11 +3,7 @@
#include <math.h>
#include <string.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "util_Table.h"
#include "cctk.h"

File diff suppressed because it is too large Load Diff

View File

@@ -18,11 +18,7 @@ using namespace std;
#include <math.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "misc.h"
#include "macrodef.h"

View File

@@ -20,11 +20,7 @@ using namespace std;
#include <map.h>
#endif
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "misc.h"
#include "macrodef.h"

View File

@@ -9,11 +9,7 @@
using namespace std;
#include <time.h>
#ifdef MPI_STUB
#include "mpi_stub.h"
#else
#include <mpi.h>
#endif
#include "var.h"

View File

@@ -11,30 +11,17 @@
import AMSS_NCKU_Input as input_data
import subprocess
import time
import os
## OpenMP configuration for threaded Fortran kernels
## OMP_NUM_THREADS: set to number of physical cores (not hyperthreads)
## OMP_PROC_BIND: bind threads to cores to avoid migration overhead
## OMP_STACKSIZE: each thread needs stack space for fh arrays (~3.6MB)
if "OMP_NUM_THREADS" not in os.environ:
os.environ["OMP_NUM_THREADS"] = "96"
os.environ["OMP_STACKSIZE"] = "16M"
os.environ["OMP_PROC_BIND"] = "close"
os.environ["OMP_PLACES"] = "cores"
## CPU core binding configuration using taskset
## taskset ensures all child processes inherit the CPU affinity mask
## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111)
## Format: taskset -c 4-55,60-111 ensures processes only run on these cores
#NUMACTL_CPU_BIND = "taskset -c 0-111"
#NUMACTL_CPU_BIND = "taskset -c 16-47,64-95"
#NUMACTL_CPU_BIND = "taskset -c 8-15"
NUMACTL_CPU_BIND = ""
NUMACTL_CPU_BIND = "taskset -c 16-47,64-95"
## Build parallelism configuration
## Use nohz_full cores (4-55, 60-111) for compilation: 52 + 52 = 104 cores
## Set make -j to utilize available cores for faster builds
BUILD_JOBS = 16
BUILD_JOBS = 96
##################################################################
@@ -128,28 +115,29 @@ def run_ABE():
print( )
## Define the command to run; cast other values to strings as needed
if (input_data.GPU_Calculation == "no"):
run_command = NUMACTL_CPU_BIND + " ./ABE"
run_command_outfile = "ABE_out.log"
mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
#mpi_command = " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
mpi_command_outfile = "ABE_out.log"
elif (input_data.GPU_Calculation == "yes"):
run_command = NUMACTL_CPU_BIND + " ./ABEGPU"
run_command_outfile = "ABEGPU_out.log"
## Execute the command and stream output
run_process = subprocess.Popen(run_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABEGPU"
mpi_command_outfile = "ABEGPU_out.log"
## Execute the MPI command and stream output
mpi_process = subprocess.Popen(mpi_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
## Write ABE run output to file while printing to stdout
with open(run_command_outfile, 'w') as file0:
with open(mpi_command_outfile, 'w') as file0:
## Read and print output lines; also write each line to file
for line in run_process.stdout:
for line in mpi_process.stdout:
print(line, end='') # stream output in real time
file0.write(line) # write the line to file
file0.flush() # flush to ensure each line is written immediately (optional)
file0.close()
## Wait for the process to finish
run_return_code = run_process.wait()
mpi_return_code = mpi_process.wait()
print( )
print( " The ABE/ABEGPU simulation is finished " )

29
parallel_plot_helper.py Normal file
View File

@@ -0,0 +1,29 @@
import multiprocessing
def run_plot_task(task):
"""Execute a single plotting task.
Parameters
----------
task : tuple
A tuple of (function, args_tuple) where function is a callable
plotting function and args_tuple contains its arguments.
"""
func, args = task
return func(*args)
def run_plot_tasks_parallel(plot_tasks):
"""Execute a list of independent plotting tasks in parallel.
Uses the 'fork' context to create worker processes so that the main
script is NOT re-imported/re-executed in child processes.
Parameters
----------
plot_tasks : list of tuples
Each element is (function, args_tuple).
"""
ctx = multiprocessing.get_context('fork')
with ctx.Pool() as pool:
pool.map(run_plot_task, plot_tasks)

Binary file not shown.

Binary file not shown.

View File

@@ -11,6 +11,8 @@
import numpy ## numpy for array operations
import scipy ## scipy for interpolation and signal processing
import math
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt ## matplotlib for plotting
import os ## os for system/file operations

View File

@@ -8,16 +8,23 @@
##
#################################################
## Restrict OpenMP to one thread per process so that running
## many workers in parallel does not create an O(workers * BLAS_threads)
## thread explosion. The variable MUST be set before numpy/scipy
## are imported, because the BLAS library reads them only at load time.
import os
os.environ.setdefault("OMP_NUM_THREADS", "1")
import numpy
import scipy
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from mpl_toolkits.mplot3d import Axes3D
## import torch
import AMSS_NCKU_Input as input_data
import os
#########################################################################################
@@ -192,3 +199,19 @@ def get_data_xy( Rmin, Rmax, n, data0, time, figure_title, figure_outdir ):
####################################################################################
####################################################################################
## Allow this module to be run as a standalone script so that each
## binary-data plot can be executed in a fresh subprocess whose BLAS
## environment variables (set above) take effect before numpy loads.
##
## Usage: python3 plot_binary_data.py <filename> <binary_outdir> <figure_outdir>
####################################################################################
if __name__ == '__main__':
import sys
if len(sys.argv) != 4:
print(f"Usage: {sys.argv[0]} <filename> <binary_outdir> <figure_outdir>")
sys.exit(1)
plot_binary_data(sys.argv[1], sys.argv[2], sys.argv[3])

View File

@@ -8,6 +8,8 @@
#################################################
import numpy ## numpy for array operations
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt ## matplotlib for plotting
from mpl_toolkits.mplot3d import Axes3D ## needed for 3D plots
import glob
@@ -15,6 +17,9 @@ import os ## operating system utilities
import plot_binary_data
import AMSS_NCKU_Input as input_data
import subprocess
import sys
import multiprocessing
# plt.rcParams['text.usetex'] = True ## enable LaTeX fonts in plots
@@ -50,10 +55,40 @@ def generate_binary_data_plot( binary_outdir, figure_outdir ):
file_list.append(x)
print(x)
## Plot each file in the list
## Plot each file in parallel using subprocesses.
## Each subprocess is a fresh Python process where the BLAS thread-count
## environment variables (set at the top of plot_binary_data.py) take
## effect before numpy is imported. This avoids the thread explosion
## that occurs when multiprocessing.Pool with 'fork' context inherits
## already-initialized multi-threaded BLAS from the parent.
script = os.path.join( os.path.dirname(__file__), "plot_binary_data.py" )
max_workers = min( multiprocessing.cpu_count(), len(file_list) ) if file_list else 0
running = []
failed = []
for filename in file_list:
print(filename)
plot_binary_data.plot_binary_data(filename, binary_outdir, figure_outdir)
proc = subprocess.Popen(
[sys.executable, script, filename, binary_outdir, figure_outdir],
)
running.append( (proc, filename) )
## Keep at most max_workers subprocesses active at a time
if len(running) >= max_workers:
p, fn = running.pop(0)
p.wait()
if p.returncode != 0:
failed.append(fn)
## Wait for all remaining subprocesses to finish
for p, fn in running:
p.wait()
if p.returncode != 0:
failed.append(fn)
if failed:
print( " WARNING: the following binary data plots failed:" )
for fn in failed:
print( " ", fn )
print( )
print( " Binary Data Plot Has been Finished " )