Major optimization: Pre-build grid segment lists (GSLs) once per Step() call
via SyncPreparePlan(), then reuse them across all 4 RK4 substep SyncBegin calls
via SyncBeginWithPlan(). This eliminates the O(cpusize * blocks^2) GSL rebuild
cost that was incurred on every ghost zone exchange.
Applied async SyncBegin/SyncEnd overlap pattern to Z4c_class.C (ABEtype==2,
the default configuration), which was still using blocking Parallel::Sync.
Both the regular and CPBC variants of Z4c Step() are now optimized.
Co-authored-by: copilot-swe-agent[bot] <198982749+copilot@users.noreply.github.com>