AMSS-NCKU

64-BitBrainstorm_2026/AMSS-NCKU

Fork 0

Commit Graph

Author	SHA1	Message	Date
ianchb	afd4006da2	Cache GSL in SyncPlan and apply async Sync to Z4c_class Major optimization: Pre-build grid segment lists (GSLs) once per Step() call via SyncPreparePlan(), then reuse them across all 4 RK4 substep SyncBegin calls via SyncBeginWithPlan(). This eliminates the O(cpusize * blocks^2) GSL rebuild cost that was incurred on every ghost zone exchange. Applied async SyncBegin/SyncEnd overlap pattern to Z4c_class.C (ABEtype==2, the default configuration), which was still using blocking Parallel::Sync. Both the regular and CPBC variants of Z4c Step() are now optimized. Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>	2026-02-08 16:46:44 +08:00
copilot-swe-agent[bot]	a918dc103e	Add SyncBegin/SyncEnd to Parallel for MPI communication-computation overlap Split the blocking Parallel::Sync into async SyncBegin (initiates local copy + MPI_Isend/Irecv) and SyncEnd (MPI_Waitall + unpack). This allows overlapping MPI ghost zone exchange with error checking and Shell patch computation. Modified Step() in bssn_class.C for both PSTR==0 and PSTR==1/2/3 versions to start Sync before error checks, overlapping the MPI_Allreduce with the ongoing ghost zone transfers. Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>	2026-02-08 16:19:13 +08:00
CGH0S7	f2fc9af70e	asc26 amss-ncku initialized	2026-01-13 15:01:15 +08:00

Author

SHA1

Message

Date

ianchb

afd4006da2

Cache GSL in SyncPlan and apply async Sync to Z4c_class

Major optimization: Pre-build grid segment lists (GSLs) once per Step() call
via SyncPreparePlan(), then reuse them across all 4 RK4 substep SyncBegin calls
via SyncBeginWithPlan(). This eliminates the O(cpusize * blocks^2) GSL rebuild
cost that was incurred on every ghost zone exchange.

Applied async SyncBegin/SyncEnd overlap pattern to Z4c_class.C (ABEtype==2,
the default configuration), which was still using blocking Parallel::Sync.
Both the regular and CPBC variants of Z4c Step() are now optimized.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>

2026-02-08 16:46:44 +08:00

copilot-swe-agent[bot]

a918dc103e

Add SyncBegin/SyncEnd to Parallel for MPI communication-computation overlap

Split the blocking Parallel::Sync into async SyncBegin (initiates local copy +
MPI_Isend/Irecv) and SyncEnd (MPI_Waitall + unpack). This allows overlapping MPI
ghost zone exchange with error checking and Shell patch computation.

Modified Step() in bssn_class.C for both PSTR==0 and PSTR==1/2/3 versions to
start Sync before error checks, overlapping the MPI_Allreduce with the ongoing
ghost zone transfers.

Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>

2026-02-08 16:19:13 +08:00

CGH0S7

f2fc9af70e

asc26 amss-ncku initialized

2026-01-13 15:01:15 +08:00

3 Commits