Commit Graph

  • 9c33e16571 增加C算子PGO文件 CGH0S7 2026-02-27 11:30:36 +08:00
  • f7ada421cf skip redundant MPI ghost cell syncs for stages 0, 1 & 2 chb-replace ianchb 2026-02-26 16:16:33 +08:00
  • 45b7a43576 补全C算子和Fortran算子的数学差异 CGH0S7 2026-02-26 15:48:11 +08:00
  • dfb79e3e11 Initialize output arrays to zero in fdderivs_c.C and fderivs_c.C ianchb 2026-02-26 11:48:28 +08:00
  • fb9f153662 Initialize output arrays to zero in fdderivs_c.C and fderivs_c.C ianchb 2026-02-26 11:48:28 +08:00
  • f147f79ffa 修改block划分,对负载高的rank所在block进行划分,添加到空rank,空rank是平移得到的 yx-vacation jaunatisblue 2026-02-26 09:40:46 +08:00
  • d2c2214fa1 补充TwoPunctureABE专用PGO插桩文件 CGH0S7 2026-02-25 23:06:17 +08:00
  • e157ea3a23 合并 chb-replace:C++ 算子替换 Fortran bssn_rhs,添加回退开关与独立 PGO profdata CGH0S7 2026-02-25 22:50:46 +08:00
  • f5a63f1e42 Revert "Fix timing: replace clock() with MPI_Wtime() for wall-clock measurement" ianchb 2026-02-25 22:21:43 +08:00
  • 284ab80baf Remove OpenMP from C rewrite kernel ianchb 2026-02-25 13:15:24 +00:00
  • 09b937c022 Fix timing: replace clock() with MPI_Wtime() for wall-clock measurement copilot-swe-agent[bot] 2026-02-25 12:42:47 +00:00
  • 8a9c775705 Replace Fortran bssn_rhs with C implementation and add C helper kernels wingrew 2026-02-25 18:59:33 +08:00
  • d942122043 更新PGO文件 CGH0S7 2026-02-25 18:25:20 +08:00
  • a5c713a7e0 完善PGO机制 CGH0S7 2026-02-25 17:22:56 +08:00
  • 9e6b25163a 更新 PGO profdata 并为 ABE 插桩编译添加 PGO_MODE 开关 CGH0S7 2026-02-25 17:00:55 +08:00
  • efc8bf29ea 按需失效同步缓存:Regrid_Onelevel 改为返回 bool CGH0S7 2026-02-25 16:00:26 +08:00
  • ccf6adaf75 提供正确的macrodef.h避免llm被误导 CGH0S7 2026-02-25 11:47:14 +08:00
  • e2bc472845 优化绑核逻辑,取消硬编码改为智能识别 CGH0S7 2026-02-25 10:59:32 +08:00
  • 8abac8dd88 对rank运行时间统计,两个函数分别在不同的计算中被调用,因此我对两个重载的函数分别进行了mpi实际计算时间的统计,对于第一个PatList_Interp_Points 调用 Interp_points,我取排名前三的rank时间,发现每次只有一个rank时间较长,Rank [ 52]: Calc 0.000012 s jaunatisblue 2026-02-24 14:33:04 +08:00
  • e6329b013d Merge branch 'cjy-oneapi-opus-hotfix' CGH0S7 2026-02-20 14:18:33 +08:00
  • 82339f5282 Merge lopsided advection + kodis dissipation to share symmetry_bd buffer ianchb 2026-02-20 09:45:37 +08:00
  • 94f38c57f9 Don't hardcode pgo profile path ianchb 2026-02-20 08:48:25 +08:00
  • cc06e30404 Apply async Sync optimization to Z4c_class using Sync_start/finish pattern chb-new ianchb 2026-02-20 09:50:40 +08:00
  • 25c79dc7cd Merge lopsided advection + kodis dissipation to share symmetry_bd buffer ianchb 2026-02-20 09:45:37 +08:00
  • a725d34dd3 Don't hardcode pgo profile path ianchb 2026-02-20 08:48:25 +08:00
  • 85d1e8de87 Add Intel SIMD vectorization directives to hot-spot functions CGH0S7 2026-02-14 00:43:39 +08:00
  • b32675ba99 1. Pass 1(357-395行):遍历所有 Patch,对每个 block 计算含ghost zone 的实际体积,存入 block_volumes 2. Greedy LPT(397-414行):按体积从大到小排序,依次分配给当前负载最小的 rank 3. Pass 2(416-555行):原来的 block创建循环,但用 assigned_ranks[block_idx++] 替代 n_rank++,Block 构造时直接拿到正确的 rank,内存分配在对的进程上 yx-mpi jaunatisblue 2026-02-12 03:22:46 +08:00
  • 93362baee5 修改transfer jaunatisblue 2026-02-12 00:58:18 +08:00
  • 2791d2e225 Merge pull request 'PGO updated' (#1) from cjy-oneapi-opus-hotfix into main gh0s7 2026-02-11 19:17:35 +08:00
  • 72ce153e48 Merge cjy-oneapi-opus-hotfix into main CGH0S7 2026-02-11 19:15:12 +08:00
  • 5b7e05cd32 PGO updated CGH0S7 2026-02-11 18:26:30 +08:00
  • 85afe00fc5 Merge plotting optimizations from chb-copilot-test CGH0S7 2026-02-11 16:19:17 +08:00
  • 5c1790277b Replace nested OutBdLow2Hi loops with batch calls in RestrictProlong CGH0S7 2026-02-11 16:09:08 +08:00
  • 714c6e90c6 Add OpenMP parallelization to Fortran compute kernels cjy-oneapi-opus-windfall CGH0S7 2026-02-10 23:40:17 +08:00
  • caf192b2e4 Remove MPI dependency, replace with single-process stub for non-MPI builds CGH0S7 2026-02-10 22:51:11 +08:00
  • e09ae438a2 Cache data_packer lengths in Sync_start to skip redundant buffer-size traversals CGH0S7 2026-02-10 21:39:22 +08:00
  • d06d5b4db8 Add targeted point-to-point Interp_Points overload for surface_integral CGH0S7 2026-02-10 19:18:56 +08:00
  • 8b68b5d782 fixup! Fix load explosion: use subprocess for binary data plots to avoid thread conflict chb-copilot-test ianchb 2026-02-09 22:57:17 +08:00
  • 50e2a845f8 Replace MPI_Allreduce with owner-rank MPI_Bcast in Patch::Interp_Points CGH0S7 2026-02-09 22:39:18 +08:00
  • 738498cb28 Optimize MPI communication in RestrictProlong and surface_integral CGH0S7 2026-02-09 22:07:12 +08:00
  • dd2443c926 Fix load explosion: use subprocess for binary data plots to avoid thread conflict ianchb 2026-02-09 21:40:27 +08:00
  • 2d7ba5c60c [2/2] Implement multiprocessing-based parallel plotting ianchb 2026-02-09 09:31:55 +00:00
  • 42b9cf1ad9 Optimize MPI Sync with merged transfers, caching, and async overlap CGH0S7 2026-02-09 21:03:37 +08:00
  • 4777cad4ed [1/2] Implement multiprocessing-based parallel plotting ianchb 2026-02-09 15:13:18 +08:00
  • e9d321fd00 Convert MPI_Allreduce error checks to non-blocking MPI_Iallreduce overlapped with Sync CGH0S7 2026-02-09 12:39:29 +08:00
  • ed1d86ade9 Merge paired MPI_Allreduce error checks to reduce global sync barriers CGH0S7 2026-02-09 12:12:16 +08:00
  • 471baa5065 PGO supported CGH0S7 2026-02-09 10:59:26 +08:00
  • 86704100ec Only enable OpenMP for TwoPunctures chb-twopunctures ianchb 2026-02-08 13:00:37 +08:00
  • 291d40c04b Use OpenMP's parallel for with schedule(dynamic,1) ianchb 2026-02-07 19:04:51 +08:00
  • 32ed7ec5bd Optimize memory allocation in JFD_times_dv ianchb 2026-02-07 15:55:45 +08:00
  • c5f8a18ba4 对lopsided和kodis进行合并,减少symmetry_bd开销,有0.01~0.02s单步效果 jaunatisblue 2026-02-08 23:21:54 +08:00
  • afd4006da2 Cache GSL in SyncPlan and apply async Sync to Z4c_class ianchb 2026-02-08 08:36:21 +00:00
  • a918dc103e Add SyncBegin/SyncEnd to Parallel for MPI communication-computation overlap copilot-swe-agent[bot] 2026-02-08 08:00:15 +00:00
  • 4bb6c03013 makefile setting updated CGH0S7 2026-02-08 16:14:43 +08:00
  • 38c2c30186 Merge lopsided advection + kodis dissipation to share symmetry_bd buffer copilot-swe-agent[bot] 2026-02-08 06:38:03 +00:00
  • b8e41b2b39 Only enable OpenMP for TwoPunctures ianchb 2026-02-08 13:00:37 +08:00
  • 3f7e20f702 删除diff_new.f90中冗余部分,方便后续工作 yx-fmisc jaunatisblue 2026-02-08 00:54:23 +08:00
  • 6796384bf4 taskset setting updated cjy-oneapi-opus-rhs-preview CGH0S7 2026-02-07 22:24:02 +08:00
  • c974a88d6d Pool fh work arrays in compute_rhs_bssn to eliminate allocation churn CGH0S7 2026-02-07 21:49:12 +08:00
  • 133e4f13a2 Use OpenMP's parallel for with schedule(dynamic,1) ianchb 2026-02-07 19:04:51 +08:00
  • 914c4f4791 Optimize memory allocation in JFD_times_dv ianchb 2026-02-07 15:55:45 +08:00
  • f345b0e520 Performance optimization for the TwoPunctures module * Re-enabled OpenMP. ianchb 2026-02-07 14:46:46 +08:00
  • f5ed23d687 Revert "Eliminate hot-path heap allocations in TwoPunctures spectral solver" ianchb 2026-02-07 10:35:05 +08:00
  • 03d501db04 Display the runtime of TwoPunctures ianchb 2026-02-06 21:27:41 +08:00
  • c6e4d4ab71 Add OpenMP parallelization to BSSN RHS hot-path stencil routines cjy-oneapi-opus-preview CGH0S7 2026-02-07 13:58:55 +08:00
  • 673dd20722 对fmisc.f90的polint修改 jaunatisblue 2026-02-07 01:56:44 +08:00
  • 09ffdb553d Eliminate hot-path heap allocations in TwoPunctures spectral solver CGH0S7 2026-02-06 21:20:35 +08:00
  • 699e443c7a Optimize polint/polin2/polin3 interpolation for cache locality CGH0S7 2026-02-06 19:00:35 +08:00
  • 24bfa44911 Disable NaN sanity check in bssn_rhs.f90 for production builds CGH0S7 2026-02-06 18:36:29 +08:00
  • 6738854a9d Compiler-level and hot-path optimizations for GW150914 CGH0S7 2026-02-06 17:13:39 +08:00
  • 4eb698f496 Add MPI+OpenMP hybrid parallelism (48 ranks x 2 threads) for full 96-core utilization cjy-oneapi-opus-openmp CGH0S7 2026-02-06 15:53:15 +08:00
  • 223ec17a54 input updated cjy-oneapi CGH0S7 2026-02-06 13:57:48 +08:00
  • 082f9c3423 feat: Implement hybrid MPI+OpenMP parallelization - Enable -qopenmp in makefile.inc - Add OpenMP directives to 4th order derivatives in diff_new.f90 - Update makefile_and_run.py to dynamic calculate OMP_NUM_THREADS based on 96 cores and remove hardcoded CPU binding cjy-oneapi-openmp CGH0S7 2026-02-06 13:25:07 +08:00
  • 79af79d471 baseline updated baseline CGH0S7 2026-02-05 19:53:55 +08:00
  • 6fffaa13f6 Optimize buffer_width dynamically based on FD order to improve scalability cjy-oneapi-parallel CGH0S7 2026-01-31 19:04:19 +08:00
  • 6684016e8c Optimize MPI domain decomposition min_width calculation to improve scalability CGH0S7 2026-01-31 16:23:16 +08:00
  • 95575d9450 fix: try to fix segfault at 240 steps by adding WithShell guard for writecheck_sh call chb-local ianchb 2026-01-22 14:26:41 +08:00
  • d11eaa2242 Optimize bssn_rhs.f90: Fuse loops for metric inversion and Christoffel symbols to improve cache locality cjy-oneapi-laptop CGH0S7 2026-01-21 11:22:33 +08:00
  • 54600327da fix(build): update makefile.inc for debian 13 ianchb 2026-01-21 09:29:35 +08:00
  • ef96766e22 优化 compute_rhs_bssn 热点路径并加入 NaN 检查开关 CGH0S7 2026-01-20 19:37:26 +08:00
  • ae7b77e44c Setup GW150914-mini test case for laptop development CGH0S7 2026-01-20 00:31:40 +08:00
  • 26c81d8e81 makefile updated CGH0S7 2026-01-19 23:53:16 +08:00
  • ed89bc029b Fix potential division by zero in reta_val calculation and enable NaN checks cjy-oneapi-test CGH0S7 2026-01-19 20:29:48 +08:00
  • 19274e93d1 Fix boundary handling in bssn_rhs_opt.f90 to prevent NaNs CGH0S7 2026-01-19 20:03:22 +08:00
  • ae1a474cca Fix compilation errors and complete logic in BSSN RHS optimization CGH0S7 2026-01-19 19:22:52 +08:00
  • cbb8fb3a87 patched last commit CGH0S7 2026-01-19 17:14:28 +08:00
  • 4472d89a9f Optimize bssn_rhs calculation with cache blocking and vectorization CGH0S7 2026-01-19 16:39:24 +08:00
  • 3914659ebb Optimize BSSN RHS and finite difference calculations - Integrate Intel oneMKL VML for efficient Gauge calculation in bssn_rhs.f90 - Refactor fderivs in diff_new.f90 to separate bulk/boundary loops for better vectorization - Add optimization report in docs/optimization_report.md cjy-oneapi-preview CGH0S7 2026-01-19 10:49:14 +08:00
  • 039dce4d65 Add aggressive compiler optimizations and vectorization directives CGH0S7 2026-01-19 10:17:31 +08:00
  • c524228d23 Enable multi-threaded MKL for better resource utilization CGH0S7 2026-01-19 09:31:29 +08:00
  • 9deeda9831 Refactor verification method and optimize numerical kernels with oneMKL BLAS CGH0S7 2026-01-18 14:25:21 +08:00
  • 3a7bce3af2 Update Intel oneAPI configuration and CPU binding settings CGH0S7 2026-01-17 20:41:02 +08:00
  • c6945bb095 Rename verify_accuracy.py to AMSS_NCKU_Verify_ASC26.py and improve visual output CGH0S7 2026-01-17 14:54:33 +08:00
  • 0d24f1503c Add accuracy verification script for GW150914 simulation CGH0S7 2026-01-17 00:37:30 +08:00
  • cb252f5ea2 Optimize numerical algorithms with Intel oneMKL CGH0S7 2026-01-16 10:58:11 +08:00
  • 7a76cbaafd Add numactl CPU binding to avoid cores 0-3 and 56-59 CGH0S7 2026-01-16 10:24:46 +08:00
  • 57a7376044 Switch compiler toolchain from GCC to Intel oneAPI CGH0S7 2026-01-15 16:32:12 +08:00
  • cd5ceaa15f main branch updated CGH0S7 2026-01-14 08:55:53 +08:00
  • 75be0968fc feat: port GPU code to CUDA 13 and enable GPU computation cjy CGH0S7 2026-01-13 18:15:49 +00:00
  • b27e071cde Makefile updated for rocky10 CGH0S7 2026-01-14 01:41:31 +08:00