Split the blocking Parallel::Sync into async SyncBegin (initiates local copy +
MPI_Isend/Irecv) and SyncEnd (MPI_Waitall + unpack). This allows overlapping MPI
ghost zone exchange with error checking and Shell patch computation.
Modified Step() in bssn_class.C for both PSTR==0 and PSTR==1/2/3 versions to
start Sync before error checks, overlapping the MPI_Allreduce with the ongoing
ghost zone transfers.
Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>