Compare commits

...

13 Commits

Author SHA1 Message Date
8b68b5d782 fixup! Fix load explosion: use subprocess for binary data plots to avoid thread conflict
* Seems we don't have to set so many variables, `OMP_NUM_THREADS` is enough.

Test: Annotate the code for setting other environment variables. It runs normally.
2026-02-09 23:00:17 +08:00
dd2443c926 Fix load explosion: use subprocess for binary data plots to avoid thread conflict
Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
2026-02-09 21:40:27 +08:00
2d7ba5c60c [2/2] Implement multiprocessing-based parallel plotting 2026-02-09 21:36:45 +08:00
4777cad4ed [1/2] Implement multiprocessing-based parallel plotting 2026-02-09 15:13:18 +08:00
afd4006da2 Cache GSL in SyncPlan and apply async Sync to Z4c_class
Major optimization: Pre-build grid segment lists (GSLs) once per Step() call
via SyncPreparePlan(), then reuse them across all 4 RK4 substep SyncBegin calls
via SyncBeginWithPlan(). This eliminates the O(cpusize * blocks^2) GSL rebuild
cost that was incurred on every ghost zone exchange.

Applied async SyncBegin/SyncEnd overlap pattern to Z4c_class.C (ABEtype==2,
the default configuration), which was still using blocking Parallel::Sync.
Both the regular and CPBC variants of Z4c Step() are now optimized.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
2026-02-08 16:46:44 +08:00
copilot-swe-agent[bot]
a918dc103e Add SyncBegin/SyncEnd to Parallel for MPI communication-computation overlap
Split the blocking Parallel::Sync into async SyncBegin (initiates local copy +
MPI_Isend/Irecv) and SyncEnd (MPI_Waitall + unpack). This allows overlapping MPI
ghost zone exchange with error checking and Shell patch computation.

Modified Step() in bssn_class.C for both PSTR==0 and PSTR==1/2/3 versions to
start Sync before error checks, overlapping the MPI_Allreduce with the ongoing
ghost zone transfers.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
2026-02-08 16:19:13 +08:00
copilot-swe-agent[bot]
38c2c30186 Merge lopsided advection + kodis dissipation to share symmetry_bd buffer
Add lopsided_kodis subroutine in lopsidediff.f90 that combines upwind
advection (lopsided) and Kreiss-Oliger dissipation (kodis) into one
function sharing a single fh buffer from symmetry_bd. This eliminates
27 redundant full-grid copies per RHS evaluation (108 per timestep).

For gxx/gyy/gzz variables: kodis stencil coefficients sum to zero
(1-6+15-20+15-6+1=0), so using gxx(=dxx+1) instead of dxx for the
dissipation buffer is mathematically exact.

Update bssn_rhs.f90 to use the merged lopsided_kodis calls.

Co-authored-by: ianchb <45872450+ianchb@users.noreply.github.com>
2026-02-08 15:42:44 +08:00
b8e41b2b39 Only enable OpenMP for TwoPunctures 2026-02-08 13:00:37 +08:00
133e4f13a2 Use OpenMP's parallel for with schedule(dynamic,1) 2026-02-07 19:48:24 +08:00
914c4f4791 Optimize memory allocation in JFD_times_dv
This should reduce the pressure on the memory allocator, indirectly improving caching behavior.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
2026-02-07 15:55:45 +08:00
f345b0e520 Performance optimization for the TwoPunctures module
* Re-enabled OpenMP.

1. Batch spectral derivatives (Chebyshev & Fourier) via precomputed matrices:
Chebyshev/Fourier transforms and derivatives are precomputed as explicit physical-space operator matrices.
Batch DGEMM now applies to entire tensor fields, mathematically identical to original per-line transforms but vastly faster.

2. Gauss-Seidel relaxation & tridiagonal solver workspace reuse:
Per-thread reusable workspaces replace per-call heap new/delete in all tridiagonal and relaxation routines.

3. Efficient OpenMP multithreading throughout relaxation/deriv:
relax_omp and friends parallelize over grouped lines/planes, maximizing threading efficiency and memory independence.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>
2026-02-07 14:48:47 +08:00
f5ed23d687 Revert "Eliminate hot-path heap allocations in TwoPunctures spectral solver"
This reverts commit 09ffdb553d.
2026-02-07 14:45:25 +08:00
03d501db04 Display the runtime of TwoPunctures 2026-02-07 14:45:16 +08:00
16 changed files with 1850 additions and 436 deletions

View File

@@ -8,6 +8,14 @@
## ##
################################################################## ##################################################################
## Guard against re-execution by multiprocessing child processes.
## Without this, using 'spawn' or 'forkserver' context would cause every
## worker to re-run the entire script, spawning exponentially more
## workers (fork bomb).
if __name__ != '__main__':
import sys as _sys
_sys.exit(0)
################################################################## ##################################################################
@@ -424,26 +432,31 @@ print(
import plot_xiaoqu import plot_xiaoqu
import plot_GW_strain_amplitude_xiaoqu import plot_GW_strain_amplitude_xiaoqu
from parallel_plot_helper import run_plot_tasks_parallel
plot_tasks = []
## Plot black hole trajectory ## Plot black hole trajectory
plot_xiaoqu.generate_puncture_orbit_plot( binary_results_directory, figure_directory ) plot_tasks.append( ( plot_xiaoqu.generate_puncture_orbit_plot, (binary_results_directory, figure_directory) ) )
plot_xiaoqu.generate_puncture_orbit_plot3D( binary_results_directory, figure_directory ) plot_tasks.append( ( plot_xiaoqu.generate_puncture_orbit_plot3D, (binary_results_directory, figure_directory) ) )
## Plot black hole separation vs. time ## Plot black hole separation vs. time
plot_xiaoqu.generate_puncture_distence_plot( binary_results_directory, figure_directory ) plot_tasks.append( ( plot_xiaoqu.generate_puncture_distence_plot, (binary_results_directory, figure_directory) ) )
## Plot gravitational waveforms (psi4 and strain amplitude) ## Plot gravitational waveforms (psi4 and strain amplitude)
for i in range(input_data.Detector_Number): for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_gravitational_wave_psi4_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_xiaoqu.generate_gravitational_wave_psi4_plot, (binary_results_directory, figure_directory, i) ) )
plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_GW_strain_amplitude_xiaoqu.generate_gravitational_wave_amplitude_plot, (binary_results_directory, figure_directory, i) ) )
## Plot ADM mass evolution ## Plot ADM mass evolution
for i in range(input_data.Detector_Number): for i in range(input_data.Detector_Number):
plot_xiaoqu.generate_ADMmass_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_xiaoqu.generate_ADMmass_plot, (binary_results_directory, figure_directory, i) ) )
## Plot Hamiltonian constraint violation over time ## Plot Hamiltonian constraint violation over time
for i in range(input_data.grid_level): for i in range(input_data.grid_level):
plot_xiaoqu.generate_constraint_check_plot( binary_results_directory, figure_directory, i ) plot_tasks.append( ( plot_xiaoqu.generate_constraint_check_plot, (binary_results_directory, figure_directory, i) ) )
run_plot_tasks_parallel(plot_tasks)
## Plot stored binary data ## Plot stored binary data
plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory ) plot_xiaoqu.generate_binary_data_plot( binary_results_directory, figure_directory )

View File

@@ -3756,6 +3756,358 @@ void Parallel::Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry)
delete[] transfer_src; delete[] transfer_src;
delete[] transfer_dst; delete[] transfer_dst;
} }
//
// Async Sync: split into SyncBegin (initiate MPI) and SyncEnd (wait + unpack)
// This allows overlapping MPI communication with computation.
//
static void transfer_begin(Parallel::TransferState *ts)
{
int myrank;
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
int cpusize = ts->cpusize;
ts->reqs = new MPI_Request[2 * cpusize];
ts->stats = new MPI_Status[2 * cpusize];
ts->req_no = 0;
ts->send_data = new double *[cpusize];
ts->rec_data = new double *[cpusize];
int length;
for (int node = 0; node < cpusize; node++)
{
ts->send_data[node] = ts->rec_data[node] = 0;
if (node == myrank)
{
// Local copy: pack then immediately unpack (no MPI needed)
if ((length = Parallel::data_packer(0, ts->transfer_src[myrank], ts->transfer_dst[myrank],
node, PACK, ts->VarList1, ts->VarList2, ts->Symmetry)))
{
double *local_data = new double[length];
if (!local_data)
{
cout << "out of memory in transfer_begin, local copy" << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
Parallel::data_packer(local_data, ts->transfer_src[myrank], ts->transfer_dst[myrank],
node, PACK, ts->VarList1, ts->VarList2, ts->Symmetry);
Parallel::data_packer(local_data, ts->transfer_src[node], ts->transfer_dst[node],
node, UNPACK, ts->VarList1, ts->VarList2, ts->Symmetry);
delete[] local_data;
}
}
else
{
// send from this cpu to cpu#node
if ((length = Parallel::data_packer(0, ts->transfer_src[myrank], ts->transfer_dst[myrank],
node, PACK, ts->VarList1, ts->VarList2, ts->Symmetry)))
{
ts->send_data[node] = new double[length];
if (!ts->send_data[node])
{
cout << "out of memory in transfer_begin, send" << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
Parallel::data_packer(ts->send_data[node], ts->transfer_src[myrank], ts->transfer_dst[myrank],
node, PACK, ts->VarList1, ts->VarList2, ts->Symmetry);
MPI_Isend((void *)ts->send_data[node], length, MPI_DOUBLE, node, 1, MPI_COMM_WORLD,
ts->reqs + ts->req_no++);
}
// receive from cpu#node to this cpu
if ((length = Parallel::data_packer(0, ts->transfer_src[node], ts->transfer_dst[node],
node, UNPACK, ts->VarList1, ts->VarList2, ts->Symmetry)))
{
ts->rec_data[node] = new double[length];
if (!ts->rec_data[node])
{
cout << "out of memory in transfer_begin, recv" << endl;
MPI_Abort(MPI_COMM_WORLD, 1);
}
MPI_Irecv((void *)ts->rec_data[node], length, MPI_DOUBLE, node, 1, MPI_COMM_WORLD,
ts->reqs + ts->req_no++);
}
}
}
// NOTE: MPI_Waitall is NOT called here - that happens in transfer_end
}
//
static void transfer_end(Parallel::TransferState *ts)
{
// Wait for all pending MPI operations
MPI_Waitall(ts->req_no, ts->reqs, ts->stats);
// Unpack received data from remote ranks
for (int node = 0; node < ts->cpusize; node++)
if (ts->rec_data[node])
Parallel::data_packer(ts->rec_data[node], ts->transfer_src[node], ts->transfer_dst[node],
node, UNPACK, ts->VarList1, ts->VarList2, ts->Symmetry);
// Cleanup MPI buffers
for (int node = 0; node < ts->cpusize; node++)
{
if (ts->send_data[node])
delete[] ts->send_data[node];
if (ts->rec_data[node])
delete[] ts->rec_data[node];
}
delete[] ts->reqs;
delete[] ts->stats;
delete[] ts->send_data;
delete[] ts->rec_data;
}
//
Parallel::SyncHandle *Parallel::SyncBegin(Patch *Pat, MyList<var> *VarList, int Symmetry)
{
int cpusize;
MPI_Comm_size(MPI_COMM_WORLD, &cpusize);
SyncHandle *handle = new SyncHandle;
handle->num_states = 1;
handle->states = new TransferState[1];
TransferState *ts = &handle->states[0];
ts->cpusize = cpusize;
ts->VarList1 = VarList;
ts->VarList2 = VarList;
ts->Symmetry = Symmetry;
ts->owns_gsl = true;
ts->dst = build_ghost_gsl(Pat);
ts->src = new MyList<Parallel::gridseg> *[cpusize];
ts->transfer_src = new MyList<Parallel::gridseg> *[cpusize];
ts->transfer_dst = new MyList<Parallel::gridseg> *[cpusize];
for (int node = 0; node < cpusize; node++)
{
ts->src[node] = build_owned_gsl0(Pat, node);
build_gstl(ts->src[node], ts->dst, &ts->transfer_src[node], &ts->transfer_dst[node]);
}
transfer_begin(ts);
return handle;
}
//
Parallel::SyncHandle *Parallel::SyncBegin(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry)
{
int cpusize;
MPI_Comm_size(MPI_COMM_WORLD, &cpusize);
// Count patches
int num_patches = 0;
MyList<Patch> *Pp = PatL;
while (Pp) { num_patches++; Pp = Pp->next; }
SyncHandle *handle = new SyncHandle;
handle->num_states = num_patches + 1; // intra-patch transfers + 1 inter-patch transfer
handle->states = new TransferState[handle->num_states];
// Intra-patch sync: for each patch, build ghost lists and initiate transfer
int idx = 0;
Pp = PatL;
while (Pp)
{
TransferState *ts = &handle->states[idx];
ts->cpusize = cpusize;
ts->VarList1 = VarList;
ts->VarList2 = VarList;
ts->Symmetry = Symmetry;
ts->owns_gsl = true;
ts->dst = build_ghost_gsl(Pp->data);
ts->src = new MyList<Parallel::gridseg> *[cpusize];
ts->transfer_src = new MyList<Parallel::gridseg> *[cpusize];
ts->transfer_dst = new MyList<Parallel::gridseg> *[cpusize];
for (int node = 0; node < cpusize; node++)
{
ts->src[node] = build_owned_gsl0(Pp->data, node);
build_gstl(ts->src[node], ts->dst, &ts->transfer_src[node], &ts->transfer_dst[node]);
}
transfer_begin(ts);
idx++;
Pp = Pp->next;
}
// Inter-patch sync: buffer zone exchange between patches
{
TransferState *ts = &handle->states[idx];
ts->cpusize = cpusize;
ts->VarList1 = VarList;
ts->VarList2 = VarList;
ts->Symmetry = Symmetry;
ts->owns_gsl = true;
ts->dst = build_buffer_gsl(PatL);
ts->src = new MyList<Parallel::gridseg> *[cpusize];
ts->transfer_src = new MyList<Parallel::gridseg> *[cpusize];
ts->transfer_dst = new MyList<Parallel::gridseg> *[cpusize];
for (int node = 0; node < cpusize; node++)
{
ts->src[node] = build_owned_gsl(PatL, node, 5, Symmetry);
build_gstl(ts->src[node], ts->dst, &ts->transfer_src[node], &ts->transfer_dst[node]);
}
transfer_begin(ts);
}
return handle;
}
//
void Parallel::SyncEnd(SyncHandle *handle)
{
if (!handle)
return;
// Wait for all pending transfers and unpack
for (int i = 0; i < handle->num_states; i++)
{
TransferState *ts = &handle->states[i];
transfer_end(ts);
// Cleanup grid segment lists only if this state owns them
if (ts->owns_gsl)
{
if (ts->dst)
ts->dst->destroyList();
for (int node = 0; node < ts->cpusize; node++)
{
if (ts->src[node])
ts->src[node]->destroyList();
if (ts->transfer_src[node])
ts->transfer_src[node]->destroyList();
if (ts->transfer_dst[node])
ts->transfer_dst[node]->destroyList();
}
delete[] ts->src;
delete[] ts->transfer_src;
delete[] ts->transfer_dst;
}
}
delete[] handle->states;
delete handle;
}
//
// SyncPreparePlan: Pre-build grid segment lists for a patch list.
// The plan can be reused across multiple SyncBeginWithPlan calls
// as long as the mesh topology does not change (no regridding).
//
Parallel::SyncPlan *Parallel::SyncPreparePlan(MyList<Patch> *PatL, int Symmetry)
{
int cpusize;
MPI_Comm_size(MPI_COMM_WORLD, &cpusize);
// Count patches
int num_patches = 0;
MyList<Patch> *Pp = PatL;
while (Pp) { num_patches++; Pp = Pp->next; }
SyncPlan *plan = new SyncPlan;
plan->num_entries = num_patches + 1; // intra-patch + 1 inter-patch
plan->Symmetry = Symmetry;
plan->entries = new SyncPlanEntry[plan->num_entries];
// Intra-patch entries: ghost zone exchange within each patch
int idx = 0;
Pp = PatL;
while (Pp)
{
SyncPlanEntry *pe = &plan->entries[idx];
pe->cpusize = cpusize;
pe->dst = build_ghost_gsl(Pp->data);
pe->src = new MyList<Parallel::gridseg> *[cpusize];
pe->transfer_src = new MyList<Parallel::gridseg> *[cpusize];
pe->transfer_dst = new MyList<Parallel::gridseg> *[cpusize];
for (int node = 0; node < cpusize; node++)
{
pe->src[node] = build_owned_gsl0(Pp->data, node);
build_gstl(pe->src[node], pe->dst, &pe->transfer_src[node], &pe->transfer_dst[node]);
}
idx++;
Pp = Pp->next;
}
// Inter-patch entry: buffer zone exchange between patches
{
SyncPlanEntry *pe = &plan->entries[idx];
pe->cpusize = cpusize;
pe->dst = build_buffer_gsl(PatL);
pe->src = new MyList<Parallel::gridseg> *[cpusize];
pe->transfer_src = new MyList<Parallel::gridseg> *[cpusize];
pe->transfer_dst = new MyList<Parallel::gridseg> *[cpusize];
for (int node = 0; node < cpusize; node++)
{
pe->src[node] = build_owned_gsl(PatL, node, 5, Symmetry);
build_gstl(pe->src[node], pe->dst, &pe->transfer_src[node], &pe->transfer_dst[node]);
}
}
return plan;
}
//
void Parallel::SyncFreePlan(SyncPlan *plan)
{
if (!plan)
return;
for (int i = 0; i < plan->num_entries; i++)
{
SyncPlanEntry *pe = &plan->entries[i];
if (pe->dst)
pe->dst->destroyList();
for (int node = 0; node < pe->cpusize; node++)
{
if (pe->src[node])
pe->src[node]->destroyList();
if (pe->transfer_src[node])
pe->transfer_src[node]->destroyList();
if (pe->transfer_dst[node])
pe->transfer_dst[node]->destroyList();
}
delete[] pe->src;
delete[] pe->transfer_src;
delete[] pe->transfer_dst;
}
delete[] plan->entries;
delete plan;
}
//
// SyncBeginWithPlan: Use pre-built GSLs from a SyncPlan to initiate async transfer.
// This avoids the O(cpusize * blocks^2) cost of rebuilding GSLs on every call.
//
Parallel::SyncHandle *Parallel::SyncBeginWithPlan(SyncPlan *plan, MyList<var> *VarList)
{
return SyncBeginWithPlan(plan, VarList, VarList);
}
//
Parallel::SyncHandle *Parallel::SyncBeginWithPlan(SyncPlan *plan, MyList<var> *VarList1, MyList<var> *VarList2)
{
SyncHandle *handle = new SyncHandle;
handle->num_states = plan->num_entries;
handle->states = new TransferState[handle->num_states];
for (int i = 0; i < plan->num_entries; i++)
{
SyncPlanEntry *pe = &plan->entries[i];
TransferState *ts = &handle->states[i];
ts->cpusize = pe->cpusize;
ts->VarList1 = VarList1;
ts->VarList2 = VarList2;
ts->Symmetry = plan->Symmetry;
ts->owns_gsl = false; // GSLs are owned by the plan, not this handle
// Borrow GSL pointers from the plan (do NOT free them in SyncEnd)
ts->transfer_src = pe->transfer_src;
ts->transfer_dst = pe->transfer_dst;
ts->src = pe->src;
ts->dst = pe->dst;
transfer_begin(ts);
}
return handle;
}
// collect buffer grid segments or blocks for the periodic boundary condition of given patch // collect buffer grid segments or blocks for the periodic boundary condition of given patch
// --------------------------------------------------- // ---------------------------------------------------
// |con | |con | // |con | |con |

View File

@@ -81,6 +81,53 @@ namespace Parallel
int Symmetry); int Symmetry);
void Sync(Patch *Pat, MyList<var> *VarList, int Symmetry); void Sync(Patch *Pat, MyList<var> *VarList, int Symmetry);
void Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry); void Sync(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
// Async Sync: overlap MPI communication with computation
struct TransferState
{
MPI_Request *reqs;
MPI_Status *stats;
int req_no;
double **send_data;
double **rec_data;
int cpusize;
MyList<gridseg> **transfer_src;
MyList<gridseg> **transfer_dst;
MyList<gridseg> **src;
MyList<gridseg> *dst;
MyList<var> *VarList1;
MyList<var> *VarList2;
int Symmetry;
bool owns_gsl; // true if this state owns and should free the GSLs
};
struct SyncHandle
{
TransferState *states;
int num_states;
};
SyncHandle *SyncBegin(Patch *Pat, MyList<var> *VarList, int Symmetry);
SyncHandle *SyncBegin(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry);
void SyncEnd(SyncHandle *handle);
// Cached GSL plan: pre-build grid segment lists once, reuse across multiple Sync calls
struct SyncPlanEntry
{
int cpusize;
MyList<gridseg> **transfer_src;
MyList<gridseg> **transfer_dst;
MyList<gridseg> **src;
MyList<gridseg> *dst;
};
struct SyncPlan
{
SyncPlanEntry *entries;
int num_entries;
int Symmetry;
};
SyncPlan *SyncPreparePlan(MyList<Patch> *PatL, int Symmetry);
void SyncFreePlan(SyncPlan *plan);
SyncHandle *SyncBeginWithPlan(SyncPlan *plan, MyList<var> *VarList);
SyncHandle *SyncBeginWithPlan(SyncPlan *plan, MyList<var> *VarList1, MyList<var> *VarList2);
void OutBdLow2Hi(Patch *Patc, Patch *Patf, void OutBdLow2Hi(Patch *Patc, Patch *Patf,
MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */, MyList<var> *VarList1 /* source */, MyList<var> *VarList2 /* target */,
int Symmetry); int Symmetry);

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,8 @@
#ifndef TWO_PUNCTURES_H #ifndef TWO_PUNCTURES_H
#define TWO_PUNCTURES_H #define TWO_PUNCTURES_H
#include <omp.h>
#define StencilSize 19 #define StencilSize 19
#define N_PlaneRelax 1 #define N_PlaneRelax 1
#define NRELAX 200 #define NRELAX 200
@@ -32,7 +33,7 @@ private:
int npoints_A, npoints_B, npoints_phi; int npoints_A, npoints_B, npoints_phi;
double target_M_plus, target_M_minus; double target_M_plus, target_M_minus;
double admMass; double admMass;
double adm_tol; double adm_tol;
@@ -42,32 +43,17 @@ private:
int ntotal; int ntotal;
// Pre-allocated workspace buffers for hot-path allocation elimination // ===== Precomputed spectral derivative matrices =====
// LineRelax_be workspace (sized for n2) double *D1_A, *D2_A;
double *ws_diag_be, *ws_e_be, *ws_f_be, *ws_b_be, *ws_x_be; double *D1_B, *D2_B;
// LineRelax_al workspace (sized for n1) double *DF1_phi, *DF2_phi;
double *ws_diag_al, *ws_e_al, *ws_f_al, *ws_b_al, *ws_x_al;
// ThomasAlgorithm workspace (sized for max(n1,n2)) // ===== Pre-allocated workspace for LineRelax (per-thread) =====
double *ws_thomas_y; int max_threads;
// JFD_times_dv workspace (sized for nvar) double **ws_diag_be, **ws_e_be, **ws_f_be, **ws_b_be, **ws_x_be;
double *ws_jfd_values; double **ws_l_be, **ws_u_be, **ws_d_be, **ws_y_be;
derivs ws_jfd_dU, ws_jfd_U; double **ws_diag_al, **ws_e_al, **ws_f_al, **ws_b_al, **ws_x_al;
// chebft_Zeros workspace (sized for max(n1,n2,n3)+1) double **ws_l_al, **ws_u_al, **ws_d_al, **ws_y_al;
double *ws_cheb_c;
// fourft workspace (sized for max(n1,n2,n3)/2+1 each)
double *ws_four_a, *ws_four_b;
// Derivatives_AB3 workspace
double *ws_deriv_p, *ws_deriv_dp, *ws_deriv_d2p;
double *ws_deriv_q, *ws_deriv_dq;
double *ws_deriv_r, *ws_deriv_dr;
int *ws_deriv_indx;
// F_of_v workspace
double *ws_fov_sources;
double *ws_fov_values;
derivs ws_fov_U;
// J_times_dv workspace
double *ws_jtdv_values;
derivs ws_jtdv_dU, ws_jtdv_U;
struct parameters struct parameters
{ {
@@ -85,6 +71,28 @@ public:
int Newtonmaxit); int Newtonmaxit);
~TwoPunctures(); ~TwoPunctures();
// 02/07: New/modified methods
void allocate_workspace();
void free_workspace();
void precompute_derivative_matrices();
void build_cheb_deriv_matrices(int n, double *D1, double *D2);
void build_fourier_deriv_matrices(int N, double *DF1, double *DF2);
void Derivatives_AB3_MatMul(int nvar, int n1, int n2, int n3, derivs v);
void ThomasAlgorithm_ws(int N, double *b, double *a, double *c, double *x, double *q,
double *l, double *u_ws, double *d, double *y);
void LineRelax_be_omp(double *dv,
int const i, int const k, int const nvar,
int const n1, int const n2, int const n3,
double const *rhs, int const *ncols, int **cols,
double **JFD, int tid);
void LineRelax_al_omp(double *dv,
int const j, int const k, int const nvar,
int const n1, int const n2, int const n3,
double const *rhs, int const *ncols,
int **cols, double **JFD, int tid);
void relax_omp(double *dv, int const nvar, int const n1, int const n2, int const n3,
double const *rhs, int const *ncols, int **cols, double **JFD);
void Solve(); void Solve();
void set_initial_guess(derivs v); void set_initial_guess(derivs v);
int index(int i, int j, int k, int l, int a, int b, int c, int d); int index(int i, int j, int k, int l, int a, int b, int c, int d);
@@ -143,23 +151,11 @@ public:
double BY_KKofxyz(double x, double y, double z); double BY_KKofxyz(double x, double y, double z);
void SetMatrix_JFD(int nvar, int n1, int n2, int n3, derivs u, int *ncols, int **cols, double **Matrix); void SetMatrix_JFD(int nvar, int n1, int n2, int n3, derivs u, int *ncols, int **cols, double **Matrix);
void J_times_dv(int nvar, int n1, int n2, int n3, derivs dv, double *Jdv, derivs u); void J_times_dv(int nvar, int n1, int n2, int n3, derivs dv, double *Jdv, derivs u);
void relax(double *dv, int const nvar, int const n1, int const n2, int const n3,
double const *rhs, int const *ncols, int **cols, double **JFD);
void LineRelax_be(double *dv,
int const i, int const k, int const nvar,
int const n1, int const n2, int const n3,
double const *rhs, int const *ncols, int **cols,
double **JFD);
void JFD_times_dv(int i, int j, int k, int nvar, int n1, int n2, void JFD_times_dv(int i, int j, int k, int nvar, int n1, int n2,
int n3, derivs dv, derivs u, double *values); int n3, derivs dv, derivs u, double *values);
void LinEquations(double A, double B, double X, double R, void LinEquations(double A, double B, double X, double R,
double x, double r, double phi, double x, double r, double phi,
double y, double z, derivs dU, derivs U, double *values); double y, double z, derivs dU, derivs U, double *values);
void LineRelax_al(double *dv,
int const j, int const k, int const nvar,
int const n1, int const n2, int const n3,
double const *rhs, int const *ncols,
int **cols, double **JFD);
void ThomasAlgorithm(int N, double *b, double *a, double *c, double *x, double *q); void ThomasAlgorithm(int N, double *b, double *a, double *c, double *x, double *q);
void Save(char *fname); void Save(char *fname);
// provided by Vasileios Paschalidis (vpaschal@illinois.edu) // provided by Vasileios Paschalidis (vpaschal@illinois.edu)
@@ -168,4 +164,4 @@ public:
void SpecCoef(parameters par, int ivar, double *v, double *cf); void SpecCoef(parameters par, int ivar, double *v, double *cf);
}; };
#endif /* TWO_PUNCTURES_H */ #endif /* TWO_PUNCTURES_H */

View File

@@ -186,6 +186,12 @@ void Z4c_class::Step(int lev, int YN)
int ERROR = 0; int ERROR = 0;
MyList<ss_patch> *sPp; MyList<ss_patch> *sPp;
// Pre-build grid segment lists once for this level's patches.
// These are reused across predictor + 3 corrector SyncBegin calls,
// avoiding O(cpusize * blocks^2) rebuild each time.
Parallel::SyncPlan *sync_plan = Parallel::SyncPreparePlan(GH->PatL[lev], Symmetry);
// Predictor // Predictor
MyList<Patch> *Pp = GH->PatL[lev]; MyList<Patch> *Pp = GH->PatL[lev];
while (Pp) while (Pp)
@@ -321,13 +327,17 @@ void Z4c_class::Step(int lev, int YN)
} }
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // Start async ghost zone exchange - overlaps with error check and Shell computation
Parallel::SyncHandle *sync_pre = Parallel::SyncBeginWithPlan(sync_plan, SynchList_pre);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_pre); sync_pre = 0;
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -475,6 +485,7 @@ void Z4c_class::Step(int lev, int YN)
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_pre); sync_pre = 0;
SH->Dump_Data(StateList, 0, PhysTime, dT_lev); SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -485,7 +496,8 @@ void Z4c_class::Step(int lev, int YN)
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry); // Complete async ghost zone exchange
if (sync_pre) Parallel::SyncEnd(sync_pre);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -693,13 +705,17 @@ void Z4c_class::Step(int lev, int YN)
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // Start async ghost zone exchange - overlaps with error check and Shell computation
Parallel::SyncHandle *sync_cor = Parallel::SyncBeginWithPlan(sync_plan, SynchList_cor);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_cor); sync_cor = 0;
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -857,6 +873,7 @@ void Z4c_class::Step(int lev, int YN)
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_cor); sync_cor = 0;
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev); SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -868,7 +885,8 @@ void Z4c_class::Step(int lev, int YN)
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); // Complete async ghost zone exchange
if (sync_cor) Parallel::SyncEnd(sync_cor);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -1042,6 +1060,8 @@ void Z4c_class::Step(int lev, int YN)
Porg0[ithBH][2] = Porg1[ithBH][2]; Porg0[ithBH][2] = Porg1[ithBH][2];
} }
} }
Parallel::SyncFreePlan(sync_plan);
} }
#else #else
// for constraint preserving boundary (CPBC) // for constraint preserving boundary (CPBC)
@@ -1075,6 +1095,10 @@ void Z4c_class::Step(int lev, int YN)
int ERROR = 0; int ERROR = 0;
MyList<ss_patch> *sPp; MyList<ss_patch> *sPp;
// Pre-build grid segment lists once for this level's patches.
Parallel::SyncPlan *sync_plan = Parallel::SyncPreparePlan(GH->PatL[lev], Symmetry);
// Predictor // Predictor
MyList<Patch> *Pp = GH->PatL[lev]; MyList<Patch> *Pp = GH->PatL[lev];
while (Pp) while (Pp)
@@ -1542,13 +1566,17 @@ void Z4c_class::Step(int lev, int YN)
} }
#endif #endif
} }
// check error information // Start async ghost zone exchange - overlaps with error check
Parallel::SyncHandle *sync_pre = Parallel::SyncBeginWithPlan(sync_plan, SynchList_pre);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_pre); sync_pre = 0;
SH->Dump_Data(StateList, 0, PhysTime, dT_lev); SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -1558,7 +1586,8 @@ void Z4c_class::Step(int lev, int YN)
} }
} }
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry); // Complete async ghost zone exchange
if (sync_pre) Parallel::SyncEnd(sync_pre);
if (lev == 0) if (lev == 0)
{ {
@@ -2103,13 +2132,17 @@ void Z4c_class::Step(int lev, int YN)
sPp = sPp->next; sPp = sPp->next;
} }
} }
// check error information // Start async ghost zone exchange - overlaps with error check
Parallel::SyncHandle *sync_cor = Parallel::SyncBeginWithPlan(sync_plan, SynchList_cor);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_cor); sync_cor = 0;
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev); SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -2120,7 +2153,8 @@ void Z4c_class::Step(int lev, int YN)
} }
} }
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); // Complete async ghost zone exchange
if (sync_cor) Parallel::SyncEnd(sync_cor);
if (lev == 0) if (lev == 0)
{ {
@@ -2346,6 +2380,8 @@ void Z4c_class::Step(int lev, int YN)
DG_List->clearList(); DG_List->clearList();
} }
#endif #endif
Parallel::SyncFreePlan(sync_plan);
} }
#endif #endif
#undef MRBD #undef MRBD

View File

@@ -3035,6 +3035,12 @@ void bssn_class::Step(int lev, int YN)
int ERROR = 0; int ERROR = 0;
MyList<ss_patch> *sPp; MyList<ss_patch> *sPp;
// Pre-build grid segment lists once for this level's patches.
// These are reused across predictor + 3 corrector SyncBegin calls,
// avoiding O(cpusize * blocks^2) rebuild each time.
Parallel::SyncPlan *sync_plan = Parallel::SyncPreparePlan(GH->PatL[lev], Symmetry);
// Predictor // Predictor
MyList<Patch> *Pp = GH->PatL[lev]; MyList<Patch> *Pp = GH->PatL[lev];
while (Pp) while (Pp)
@@ -3158,13 +3164,18 @@ void bssn_class::Step(int lev, int YN)
} }
Pp = Pp->next; Pp = Pp->next;
} }
// check error information
// Start async ghost zone exchange - overlaps with error check and Shell computation
Parallel::SyncHandle *sync_pre = Parallel::SyncBeginWithPlan(sync_plan, SynchList_pre);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_pre); sync_pre = 0;
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -3324,6 +3335,7 @@ void bssn_class::Step(int lev, int YN)
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_pre); sync_pre = 0;
SH->Dump_Data(StateList, 0, PhysTime, dT_lev); SH->Dump_Data(StateList, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -3334,7 +3346,8 @@ void bssn_class::Step(int lev, int YN)
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry); // Complete async ghost zone exchange
if (sync_pre) Parallel::SyncEnd(sync_pre);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -3528,7 +3541,10 @@ void bssn_class::Step(int lev, int YN)
Pp = Pp->next; Pp = Pp->next;
} }
// check error information // Start async ghost zone exchange - overlaps with error check and Shell computation
Parallel::SyncHandle *sync_cor = Parallel::SyncBeginWithPlan(sync_plan, SynchList_cor);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
@@ -3536,6 +3552,7 @@ void bssn_class::Step(int lev, int YN)
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_cor); sync_cor = 0;
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -3692,6 +3709,7 @@ void bssn_class::Step(int lev, int YN)
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_cor); sync_cor = 0;
SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev); SH->Dump_Data(SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -3704,7 +3722,8 @@ void bssn_class::Step(int lev, int YN)
} }
#endif #endif
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); // Complete async ghost zone exchange
if (sync_cor) Parallel::SyncEnd(sync_cor);
#ifdef WithShell #ifdef WithShell
if (lev == 0) if (lev == 0)
@@ -3895,6 +3914,8 @@ void bssn_class::Step(int lev, int YN)
Porg0[ithBH][2] = Porg1[ithBH][2]; Porg0[ithBH][2] = Porg1[ithBH][2];
} }
} }
Parallel::SyncFreePlan(sync_plan);
} }
//================================================================================================ //================================================================================================
@@ -4817,6 +4838,12 @@ void bssn_class::Step(int lev, int YN)
int ERROR = 0; int ERROR = 0;
MyList<ss_patch> *sPp; MyList<ss_patch> *sPp;
// Pre-build grid segment lists once for this level's patches.
// These are reused across predictor + 3 corrector SyncBegin calls,
// avoiding O(cpusize * blocks^2) rebuild each time.
Parallel::SyncPlan *sync_plan = Parallel::SyncPreparePlan(GH->PatL[lev], Symmetry);
// Predictor // Predictor
MyList<Patch> *Pp = GH->PatL[lev]; MyList<Patch> *Pp = GH->PatL[lev];
while (Pp) while (Pp)
@@ -4943,13 +4970,17 @@ void bssn_class::Step(int lev, int YN)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Predictor rhs calculation"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Predictor rhs calculation");
// check error information // Start async ghost zone exchange - overlaps with error check and BH position
Parallel::SyncHandle *sync_pre = Parallel::SyncBeginWithPlan(sync_plan, SynchList_pre);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev]); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev]);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_pre); sync_pre = 0;
Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], StateList, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -4961,7 +4992,8 @@ void bssn_class::Step(int lev, int YN)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Predictor sync"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Predictor sync");
Parallel::Sync(GH->PatL[lev], SynchList_pre, Symmetry); // Complete async ghost zone exchange
if (sync_pre) Parallel::SyncEnd(sync_pre);
#if (MAPBH == 0) #if (MAPBH == 0)
// for black hole position // for black hole position
@@ -5140,13 +5172,17 @@ void bssn_class::Step(int lev, int YN)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector error check"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector error check");
// check error information // Start async ghost zone exchange - overlaps with error check and BH position
Parallel::SyncHandle *sync_cor = Parallel::SyncBeginWithPlan(sync_plan, SynchList_cor);
// check error information (overlaps with MPI transfer)
{ {
int erh = ERROR; int erh = ERROR;
MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev]); MPI_Allreduce(&erh, &ERROR, 1, MPI_INT, MPI_SUM, GH->Commlev[lev]);
} }
if (ERROR) if (ERROR)
{ {
Parallel::SyncEnd(sync_cor); sync_cor = 0;
Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev); Parallel::Dump_Data(GH->PatL[lev], SynchList_pre, 0, PhysTime, dT_lev);
if (myrank == 0) if (myrank == 0)
{ {
@@ -5160,7 +5196,8 @@ void bssn_class::Step(int lev, int YN)
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector sync"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before Corrector sync");
Parallel::Sync(GH->PatL[lev], SynchList_cor, Symmetry); // Complete async ghost zone exchange
if (sync_cor) Parallel::SyncEnd(sync_cor);
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Corrector sync"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after Corrector sync");
@@ -5276,6 +5313,8 @@ void bssn_class::Step(int lev, int YN)
// if(myrank==GH->start_rank[lev]) cout<<GH->mylev<<endl; // if(myrank==GH->start_rank[lev]) cout<<GH->mylev<<endl;
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"complet GH Step"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"complet GH Step");
Parallel::SyncFreePlan(sync_plan);
} }
//================================================================================================ //================================================================================================

View File

@@ -945,103 +945,60 @@
SSA(2)=SYM SSA(2)=SYM
SSA(3)=ANTI SSA(3)=ANTI
!!!!!!!!!advection term part !!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency)
! lopsided_kodis shares the symmetry_bd buffer between advection and
! dissipation, eliminating redundant full-grid copies. For metric variables
! gxx/gyy/gzz (=dxx/dyy/dzz+1): kodis stencil coefficients sum to zero,
! so the constant offset has no effect on dissipation.
call lopsided(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS) call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA) call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA) call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS) call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA) call lopsided_kodis(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA) call lopsided_kodis(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_kodis(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided_kodis(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided_kodis(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided_kodis(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
!!
#if 1
!! bam does not apply dissipation on gauge variables
call lopsided_kodis(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS,eps)
#if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided_kodis(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided_kodis(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided_kodis(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
#endif
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided_kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS,eps)
call lopsided_kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS,eps)
call lopsided_kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA,eps)
#endif
#else
! No dissipation on gauge variables (advection only)
call lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS)
#if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7) #if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA)
#endif #endif
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7) #if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS)
call lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS)
call lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA)
#endif #endif
if(eps>0)then
! usual Kreiss-Oliger dissipation
call kodis(ex,X,Y,Z,chi,chi_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,trK,trK_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,dxx,gxx_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,gxy,gxy_rhs,AAS,Symmetry,eps)
call kodis(ex,X,Y,Z,gxz,gxz_rhs,ASA,Symmetry,eps)
call kodis(ex,X,Y,Z,dyy,gyy_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,gyz,gyz_rhs,SAA,Symmetry,eps)
call kodis(ex,X,Y,Z,dzz,gzz_rhs,SSS,Symmetry,eps)
#if 0
#define i 42
#define j 40
#define k 40
if(Lev == 1)then
write(*,*) X(i),Y(j),Z(k)
write(*,*) "before",Axx_rhs(i,j,k)
endif
#undef i
#undef j
#undef k
!!stop
#endif #endif
call kodis(ex,X,Y,Z,Axx,Axx_rhs,SSS,Symmetry,eps)
#if 0
#define i 42
#define j 40
#define k 40
if(Lev == 1)then
write(*,*) X(i),Y(j),Z(k)
write(*,*) "after",Axx_rhs(i,j,k)
endif
#undef i
#undef j
#undef k
!!stop
#endif
call kodis(ex,X,Y,Z,Axy,Axy_rhs,AAS,Symmetry,eps)
call kodis(ex,X,Y,Z,Axz,Axz_rhs,ASA,Symmetry,eps)
call kodis(ex,X,Y,Z,Ayy,Ayy_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,Ayz,Ayz_rhs,SAA,Symmetry,eps)
call kodis(ex,X,Y,Z,Azz,Azz_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,Gamx,Gamx_rhs,ASS,Symmetry,eps)
call kodis(ex,X,Y,Z,Gamy,Gamy_rhs,SAS,Symmetry,eps)
call kodis(ex,X,Y,Z,Gamz,Gamz_rhs,SSA,Symmetry,eps)
#if 1
!! bam does not apply dissipation on gauge variables
call kodis(ex,X,Y,Z,Lap,Lap_rhs,SSS,Symmetry,eps)
call kodis(ex,X,Y,Z,betax,betax_rhs,ASS,Symmetry,eps)
call kodis(ex,X,Y,Z,betay,betay_rhs,SAS,Symmetry,eps)
call kodis(ex,X,Y,Z,betaz,betaz_rhs,SSA,Symmetry,eps)
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,ASS,Symmetry,eps)
call kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,SAS,Symmetry,eps)
call kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,SSA,Symmetry,eps)
#endif
#endif
endif
if(co == 0)then if(co == 0)then
! ham_Res = trR + 2/3 * K^2 - A_ij * A^ij - 16 * PI * rho ! ham_Res = trR + 2/3 * K^2 - A_ij * A^ij - 16 * PI * rho

View File

@@ -487,6 +487,201 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
end subroutine lopsided end subroutine lopsided
!-----------------------------------------------------------------------------
! Combined advection (lopsided) + Kreiss-Oliger dissipation (kodis)
! Shares the symmetry_bd buffer fh, eliminating one full-grid copy per call.
! Mathematically identical to calling lopsided then kodis separately.
!-----------------------------------------------------------------------------
subroutine lopsided_kodis(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA,eps)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
real*8,intent(in) :: eps
!~~~~~~> local variables:
! note index -2,-1,0, so we have 3 extra points
real*8,dimension(-2:ex(1),-2:ex(2),-2:ex(3)) :: fh
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F3=3.d0
real*8, parameter :: TWO=2.d0,F6=6.0d0,F18=1.8d1
real*8, parameter :: F12=1.2d1, F10=1.d1,EIT=8.d0
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
! kodis parameters
real*8, parameter :: SIX=6.d0,FIT=1.5d1,TWT=2.d1
real*8, parameter :: cof=6.4d1 ! 2^6
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -2
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -2
! Single symmetry_bd call shared by both advection and dissipation
call symmetry_bd(3,ex,f,fh,SoA)
! ---- Advection (lopsided) loop ----
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
! x direction
if(Sfx(i,j,k) > ZEO)then
if(i+3 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
elseif(i+2 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
endif
elseif(Sfx(i,j,k) < ZEO)then
if(i-3 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
elseif(i-2 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i-1 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
endif
endif
! y direction
if(Sfy(i,j,k) > ZEO)then
if(j+3 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
elseif(j+2 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
endif
elseif(Sfy(i,j,k) < ZEO)then
if(j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
elseif(j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
endif
endif
! z direction
if(Sfz(i,j,k) > ZEO)then
if(k+3 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
elseif(k+2 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
endif
elseif(Sfz(i,j,k) < ZEO)then
if(k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
elseif(k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
endif
endif
enddo
enddo
enddo
! ---- Dissipation (kodis) loop ----
if(eps > ZEO) then
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i-3 >= imin .and. i+3 <= imax .and. &
j-3 >= jmin .and. j+3 <= jmax .and. &
k-3 >= kmin .and. k+3 <= kmax) then
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )/dX + &
( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )/dY + &
( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )/dZ )
endif
enddo
enddo
enddo
endif
return
end subroutine lopsided_kodis
#elif (ghost_width == 4) #elif (ghost_width == 4)
! sixth order code ! sixth order code
! Compute advection terms in right hand sides of field equations ! Compute advection terms in right hand sides of field equations

View File

@@ -16,6 +16,12 @@ include makefile.inc
.cu.o: .cu.o:
$(Cu) $(CUDA_APP_FLAGS) -c $< -o $@ $(CUDA_LIB_PATH) $(Cu) $(CUDA_APP_FLAGS) -c $< -o $@ $(CUDA_LIB_PATH)
TwoPunctures.o: TwoPunctures.C
${CXX} $(CXXAPPFLAGS) -qopenmp -c $< -o $@
TwoPunctureABE.o: TwoPunctureABE.C
${CXX} $(CXXAPPFLAGS) -qopenmp -c $< -o $@
# Input files # Input files
C++FILES = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o\ C++FILES = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o\
cgh.o bssn_class.o surface_integral.o ShellPatch.o\ cgh.o bssn_class.o surface_integral.o ShellPatch.o\
@@ -96,7 +102,7 @@ ABEGPU: $(C++FILES_GPU) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES_GPU) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES) $(LDLIBS) $(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES_GPU) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES) $(LDLIBS)
TwoPunctureABE: $(TwoPunctureFILES) TwoPunctureABE: $(TwoPunctureFILES)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(TwoPunctureFILES) $(LDLIBS) $(CLINKER) $(CXXAPPFLAGS) -qopenmp -o $@ $(TwoPunctureFILES) $(LDLIBS)
clean: clean:
rm *.o ABE ABEGPU TwoPunctureABE make.log -f rm *.o ABE ABEGPU TwoPunctureABE make.log -f

View File

@@ -15,7 +15,6 @@ LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore
## -xHost: Optimize for the host CPU architecture (Intel/AMD compatible) ## -xHost: Optimize for the host CPU architecture (Intel/AMD compatible)
## -fp-model fast=2: Aggressive floating-point optimizations ## -fp-model fast=2: Aggressive floating-point optimizations
## -fma: Enable fused multiply-add instructions ## -fma: Enable fused multiply-add instructions
## Note: OpenMP has been disabled (-qopenmp removed) due to performance issues
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \ CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
-Dfortran3 -Dnewc -I${MKLROOT}/include -Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \ f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \

View File

@@ -10,7 +10,7 @@
import AMSS_NCKU_Input as input_data import AMSS_NCKU_Input as input_data
import subprocess import subprocess
import time
## CPU core binding configuration using taskset ## CPU core binding configuration using taskset
## taskset ensures all child processes inherit the CPU affinity mask ## taskset ensures all child processes inherit the CPU affinity mask
## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111) ## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111)
@@ -152,7 +152,7 @@ def run_ABE():
## Run the AMSS-NCKU TwoPuncture program TwoPunctureABE ## Run the AMSS-NCKU TwoPuncture program TwoPunctureABE
def run_TwoPunctureABE(): def run_TwoPunctureABE():
tp_time1=time.time()
print( ) print( )
print( " Running the AMSS-NCKU executable file TwoPunctureABE " ) print( " Running the AMSS-NCKU executable file TwoPunctureABE " )
print( ) print( )
@@ -179,7 +179,9 @@ def run_TwoPunctureABE():
print( ) print( )
print( " The TwoPunctureABE simulation is finished " ) print( " The TwoPunctureABE simulation is finished " )
print( ) print( )
tp_time2=time.time()
et=tp_time2-tp_time1
print(f"Used time: {et}")
return return
################################################################## ##################################################################

29
parallel_plot_helper.py Normal file
View File

@@ -0,0 +1,29 @@
import multiprocessing
def run_plot_task(task):
"""Execute a single plotting task.
Parameters
----------
task : tuple
A tuple of (function, args_tuple) where function is a callable
plotting function and args_tuple contains its arguments.
"""
func, args = task
return func(*args)
def run_plot_tasks_parallel(plot_tasks):
"""Execute a list of independent plotting tasks in parallel.
Uses the 'fork' context to create worker processes so that the main
script is NOT re-imported/re-executed in child processes.
Parameters
----------
plot_tasks : list of tuples
Each element is (function, args_tuple).
"""
ctx = multiprocessing.get_context('fork')
with ctx.Pool() as pool:
pool.map(run_plot_task, plot_tasks)

View File

@@ -11,6 +11,8 @@
import numpy ## numpy for array operations import numpy ## numpy for array operations
import scipy ## scipy for interpolation and signal processing import scipy ## scipy for interpolation and signal processing
import math import math
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt ## matplotlib for plotting import matplotlib.pyplot as plt ## matplotlib for plotting
import os ## os for system/file operations import os ## os for system/file operations

View File

@@ -8,16 +8,23 @@
## ##
################################################# #################################################
## Restrict OpenMP to one thread per process so that running
## many workers in parallel does not create an O(workers * BLAS_threads)
## thread explosion. The variable MUST be set before numpy/scipy
## are imported, because the BLAS library reads them only at load time.
import os
os.environ.setdefault("OMP_NUM_THREADS", "1")
import numpy import numpy
import scipy import scipy
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm from matplotlib.colors import LogNorm
from mpl_toolkits.mplot3d import Axes3D from mpl_toolkits.mplot3d import Axes3D
## import torch ## import torch
import AMSS_NCKU_Input as input_data import AMSS_NCKU_Input as input_data
import os
######################################################################################### #########################################################################################
@@ -192,3 +199,19 @@ def get_data_xy( Rmin, Rmax, n, data0, time, figure_title, figure_outdir ):
#################################################################################### ####################################################################################
####################################################################################
## Allow this module to be run as a standalone script so that each
## binary-data plot can be executed in a fresh subprocess whose BLAS
## environment variables (set above) take effect before numpy loads.
##
## Usage: python3 plot_binary_data.py <filename> <binary_outdir> <figure_outdir>
####################################################################################
if __name__ == '__main__':
import sys
if len(sys.argv) != 4:
print(f"Usage: {sys.argv[0]} <filename> <binary_outdir> <figure_outdir>")
sys.exit(1)
plot_binary_data(sys.argv[1], sys.argv[2], sys.argv[3])

View File

@@ -8,6 +8,8 @@
################################################# #################################################
import numpy ## numpy for array operations import numpy ## numpy for array operations
import matplotlib
matplotlib.use('Agg') ## use non-interactive backend for multiprocessing safety
import matplotlib.pyplot as plt ## matplotlib for plotting import matplotlib.pyplot as plt ## matplotlib for plotting
from mpl_toolkits.mplot3d import Axes3D ## needed for 3D plots from mpl_toolkits.mplot3d import Axes3D ## needed for 3D plots
import glob import glob
@@ -15,6 +17,9 @@ import os ## operating system utilities
import plot_binary_data import plot_binary_data
import AMSS_NCKU_Input as input_data import AMSS_NCKU_Input as input_data
import subprocess
import sys
import multiprocessing
# plt.rcParams['text.usetex'] = True ## enable LaTeX fonts in plots # plt.rcParams['text.usetex'] = True ## enable LaTeX fonts in plots
@@ -50,10 +55,40 @@ def generate_binary_data_plot( binary_outdir, figure_outdir ):
file_list.append(x) file_list.append(x)
print(x) print(x)
## Plot each file in the list ## Plot each file in parallel using subprocesses.
## Each subprocess is a fresh Python process where the BLAS thread-count
## environment variables (set at the top of plot_binary_data.py) take
## effect before numpy is imported. This avoids the thread explosion
## that occurs when multiprocessing.Pool with 'fork' context inherits
## already-initialized multi-threaded BLAS from the parent.
script = os.path.join( os.path.dirname(__file__), "plot_binary_data.py" )
max_workers = min( multiprocessing.cpu_count(), len(file_list) ) if file_list else 0
running = []
failed = []
for filename in file_list: for filename in file_list:
print(filename) print(filename)
plot_binary_data.plot_binary_data(filename, binary_outdir, figure_outdir) proc = subprocess.Popen(
[sys.executable, script, filename, binary_outdir, figure_outdir],
)
running.append( (proc, filename) )
## Keep at most max_workers subprocesses active at a time
if len(running) >= max_workers:
p, fn = running.pop(0)
p.wait()
if p.returncode != 0:
failed.append(fn)
## Wait for all remaining subprocesses to finish
for p, fn in running:
p.wait()
if p.returncode != 0:
failed.append(fn)
if failed:
print( " WARNING: the following binary data plots failed:" )
for fn in failed:
print( " ", fn )
print( ) print( )
print( " Binary Data Plot Has been Finished " ) print( " Binary Data Plot Has been Finished " )