Replace all 8 blocking MPI_Allreduce error-check calls with MPI_Iallreduce, overlapping the reduction with subsequent Parallel::Sync/SH->Synch operations. MPI_Wait is called after Sync completes to retrieve the error result. This hides the Allreduce latency (46.5% of CPU time) behind the ghost zone exchange communication that must happen anyway. Safe because Sync only copies existing data to ghost zones and the error check + abort happens before any further computation uses the synced data. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
335 KiB
335 KiB