Commit Graph

  • e4c25eb21f Cache wave extraction angular kernels CGH0S7 2026-04-13 12:40:20 +08:00
  • 4b10519876 Reuse mass integrand across detector radii CGH0S7 2026-04-13 11:55:41 +08:00
  • c5d1268dd1 Batch patch-boundary copy and gate CPU BC in GPU substeps ianchb 2026-04-13 11:40:06 +08:00
  • 3a58273501 Batch constraint norm reductions CGH0S7 2026-04-13 11:48:02 +08:00
  • 5c65cea2f0 Optimize constraint refresh after regrid CGH0S7 2026-04-13 11:39:50 +08:00
  • 4bdfc90f22 Pass pointer tables as kernel args and skip redundant symbol uploads ianchb 2026-04-13 11:19:00 +08:00
  • c49a4e00c9 Batch symbd_pack/lopsided/kodiss over all state variables ianchb 2026-04-13 11:02:55 +08:00
  • 1b3c0b80d2 Refactor CUDA step buffers to remove loop-time allocations ianchb 2026-04-13 10:06:40 +08:00
  • 636e35bfd8 Add direct CUDA resident-state sync path and profiling hooks ianchb 2026-04-13 00:57:05 +08:00
  • 6fd7ef2b55 Cache GPU RHS symbols and zero vacuum sources once cjy-leonhardt CGH0S7 2026-04-12 22:42:58 +08:00
  • 7f2a391dd2 Cache matter fields in StepContext across RK4 substeps ianchb 2026-04-12 22:19:45 +08:00
  • 4fa12a2009 Integrate CUDA support into RK4 substep execution ianchb 2026-04-12 22:11:44 +08:00
  • 86a683de26 Replace legacy ABEGPU stack with ABE_CUDA backend ianchb 2026-04-12 21:19:14 +08:00
  • 7064ebd5b4 Batch GPU stage downloads CGH0S7 2026-04-12 21:06:41 +08:00
  • aaf7bf0a26 Merge remote-tracking branch 'origin/main' ianchb 2026-04-12 20:55:42 +08:00
  • 87c581ea7c Checkpoint stable GPU optimization baseline CGH0S7 2026-04-12 20:26:27 +08:00
  • d702aa06b9 Trim GPU restrict sync overhead CGH0S7 2026-04-12 19:45:34 +08:00
  • ce88c18265 Tune GPU RHS launch geometry CGH0S7 2026-04-12 18:59:59 +08:00
  • db2d6978b2 Reduce final GPU host downloads CGH0S7 2026-04-12 18:46:42 +08:00
  • c8977d8356 Optimize GPU RK4 stage sync path CGH0S7 2026-04-12 18:36:05 +08:00
  • d9287ea530 Fix GPU RK4 boundary and sync correctness CGH0S7 2026-04-12 12:13:47 +08:00
  • b78874ef21 Refine stable GPU AMR staging path CGH0S7 2026-04-10 23:37:36 +08:00
  • a089041c3b Stabilize GPU AMR prolong/restrict paths CGH0S7 2026-04-10 21:57:58 +08:00
  • c578a15ecd Fix GPU interpolation cache lifetime leaks CGH0S7 2026-04-10 10:29:04 +08:00
  • e1a0bff43c Reduce redundant GPU host buffer preparation CGH0S7 2026-04-09 21:20:45 +08:00
  • cf3c6d6218 Stabilize GPU buffer lifecycle around regrid CGH0S7 2026-04-09 20:48:06 +08:00
  • 46e94d1248 Trim constraint-only GPU downloads CGH0S7 2026-04-09 19:36:19 +08:00
  • 7cd2414faa Move constraint recomputation onto GPU path CGH0S7 2026-04-09 19:17:39 +08:00
  • 4463f1d23e Unpack intermediate sync stages directly to GPU CGH0S7 2026-04-09 19:01:12 +08:00
  • 4484635f0d Pack sync send buffers directly from GPU state CGH0S7 2026-04-09 18:49:11 +08:00
  • b0dd069a2b Register GPU transfer buffers as pinned host memory CGH0S7 2026-04-09 18:36:10 +08:00
  • 5bc67ded06 Download staged GPU sync regions incrementally CGH0S7 2026-04-09 18:23:05 +08:00
  • 3b16795e78 Refresh synced GPU regions incrementally CGH0S7 2026-04-09 17:07:31 +08:00
  • 5b00d49070 Reduce staged GPU host-device copies CGH0S7 2026-04-09 16:44:08 +08:00
  • 42e851d19a Cache repeated interpolation plans CGH0S7 2026-04-09 15:21:01 +08:00
  • 06fa643365 Refine batched CUDA interpolation kernel CGH0S7 2026-04-09 15:06:11 +08:00
  • c47349b7a9 Add batched CUDA patch interpolation path CGH0S7 2026-04-09 14:56:01 +08:00
  • ad999e4c5a Add guarded GPU prolong3 path scaffold CGH0S7 2026-04-09 14:28:36 +08:00
  • e1e3b4a448 Reduce GPU RK4 transfer overhead CGH0S7 2026-04-09 12:11:40 +08:00
  • 49409645c0 Stabilize GPU output path and MPI sync CGH0S7 2026-04-09 10:57:49 +08:00
  • 4e3946a4f0 Persist GPU RK4 stage caches CGH0S7 2026-04-08 20:59:15 +08:00
  • a0af9b8804 Trim GPU main-path transfer overhead CGH0S7 2026-04-08 20:16:25 +08:00
  • 01ac1f9250 Cache GPU main-path device buffers CGH0S7 2026-04-08 19:43:17 +08:00
  • ea470737db Add runnable GPU main-path prototype CGH0S7 2026-04-08 19:14:37 +08:00
  • d96ca6ed2a Add two-node MPI launch configuration cjy-cassius CGH0S7 2026-03-30 21:13:46 +08:00
  • 60ad63e8cc Isolate TwoPuncture from ABE OMP settings CGH0S7 2026-03-30 21:00:20 +08:00
  • 087d034ee3 Use wall time for timestep logging CGH0S7 2026-03-30 20:38:41 +08:00
  • 5f664716ab Enable OpenMP task parallelism for C kernels CGH0S7 2026-03-30 20:34:34 +08:00
  • 8c1f4d8108 迁移C算子的循环融合和临时量消除 CGH0S7 2026-03-03 15:57:10 +08:00
  • d310ef918b bssn_rhs(fortran): migrate C kernel loop-fusion optimizations CGH0S7 2026-03-03 15:41:26 +08:00
  • b35e1b289f 设置开关关闭内存打印统计 CGH0S7 2026-03-03 15:15:06 +08:00
  • 05851b2c59 关闭静态负载 CGH0S7 2026-03-03 12:36:19 +08:00
  • 3b39583d67 fix(bssn_rhs) ianchb 2026-03-03 16:00:45 +08:00
  • 9c44d1c885 fix(bssn_rhs) chb-parallel ianchb 2026-03-03 16:00:45 +08:00
  • f1fe9fd443 迁移C算子的循环融合和临时量消除 cjy-dystopia CGH0S7 2026-03-03 15:57:10 +08:00
  • 7bb9042b18 bssn_rhs(fortran): migrate C kernel loop-fusion optimizations CGH0S7 2026-03-03 15:41:26 +08:00
  • 9991b7f41e 关闭C重写算子 CGH0S7 2026-03-03 15:28:09 +08:00
  • 57abf12bbd Fix C derivative kernels to match Fortran ghost_width=3 stencil gating CGH0S7 2026-03-03 15:22:01 +08:00
  • 51efc47c1b 设置开关关闭内存打印统计 CGH0S7 2026-03-03 15:15:06 +08:00
  • 4b9de28feb 将 Restrict/Prolong 链路里的 coarse-level Sync_cached 改为可选(默认跳过) ianchb 2026-03-03 14:25:27 +08:00
  • 4eb5dc4ddb 删除重复的一次 chi 一阶导计算 ianchb 2026-03-02 20:16:16 +08:00
  • 234c4f7344 关闭静态负载 CGH0S7 2026-03-03 12:36:19 +08:00
  • 688bdb6708 Merge pull request 'cjy-dystopia' (#3) from cjy-dystopia into main gh0s7 2026-03-02 21:36:26 +08:00
  • 12e1f63d50 prolong3: 减少Z-pass 冗余计算 yx-prolong jaunatisblue 2026-03-02 21:20:49 +08:00
  • 5070134857 perf(transfer_cached): 将 per-call new/delete 的 req_node/req_is_recv/completed 数组移入 SyncCache 复用 CGH0S7 2026-03-02 21:14:35 +08:00
  • 4012e9d068 perf(RestrictProlong): 用 Restrict_cached/OutBdLow2Hi_cached 替换非缓存版本,Sync_finish 改为渐进式解包 CGH0S7 2026-03-02 20:48:38 +08:00
  • 43975017eb prolong3 改为先算实际 stencil 窗口;只有窗口触及对称边界时才走全域 symmetry_bd,否则只复制必需窗口。restrict3 同样改成窗口判定,无触边时仅填 ii/jj/kk 必需窗口。 chb-rebase-wip ianchb 2026-03-02 17:17:16 +08:00
  • 485667ef4c perf(restrict3): shrink X-pass ii sweep to required overlap window - compute fi_min/fi_max from output i-range and derive ii_lo/ii_hi - replace full ii sweep (-1:extf(1)) with windowed sweep in Z/Y precompute passes - keep stencil math unchanged; add bounds sanity check for ii window ianchb 2026-03-02 16:08:13 +08:00
  • 2a977ce82e perf(MPatch): 用空间 bin 索引加速 Interp_Points 的 block 归属查找 ianchb 2026-03-02 15:54:37 +08:00
  • b3c367f15b prolong3 改为先算实际 stencil 窗口;只有窗口触及对称边界时才走全域 symmetry_bd,否则只复制必需窗口。restrict3 同样改成窗口判定,无触边时仅填 ii/jj/kk 必需窗口。 ianchb 2026-03-02 17:17:16 +08:00
  • e73911f292 perf(restrict3): shrink X-pass ii sweep to required overlap window - compute fi_min/fi_max from output i-range and derive ii_lo/ii_hi - replace full ii sweep (-1:extf(1)) with windowed sweep in Z/Y precompute passes - keep stencil math unchanged; add bounds sanity check for ii window ianchb 2026-03-02 16:08:13 +08:00
  • 7543d3e8c7 perf(MPatch): 用空间 bin 索引加速 Interp_Points 的 block 归属查找 ianchb 2026-03-02 15:54:37 +08:00
  • 42c69fab24 refactor(Parallel): streamline MPI communication by consolidating request handling and memory management ianchb 2026-03-01 16:20:51 +08:00
  • 95220a05c8 optimize fdderivs core-region branch elimination for ghost_width=3 CGH0S7 2026-03-02 17:33:26 +08:00
  • 160e2a0369 fix prolong/restrict index bounds after cherry-pick 12e1f63 CGH0S7 2026-03-02 13:59:47 +08:00
  • 01410de05a refactor(Parallel): streamline MPI communication by consolidating request handling and memory management ianchb 2026-03-01 16:20:51 +08:00
  • 83c826eb49 prolong3: 减少Z-pass 冗余计算 jaunatisblue 2026-03-02 21:20:49 +08:00
  • 466b084a58 fix prolong/restrict index bounds after cherry-pick 12e1f63 CGH0S7 2026-03-02 13:59:47 +08:00
  • 61ccef9f97 prolong3: 减少Z-pass 冗余计算 jaunatisblue 2026-03-02 21:20:49 +08:00
  • 43ddaab903 fix: add C RK4 kernel to CFILES_CUDA ianchb 2026-03-02 12:19:52 +08:00
  • 5839755c2f compute div_beta on-the-fly to remove temp array ianchb 2026-03-02 12:12:58 +08:00
  • a893b4007c merge lopsided+kodis ianchb 2026-03-02 12:12:26 +08:00
  • ad5ff03615 build: switch allocator option to oneTBB tbbmalloc CGH0S7 2026-02-28 17:16:00 +08:00
  • b4bc0ef269 先关闭绑核心,发现速度对比:不绑定核心+SCX>绑核心+SCX CGH0S7 2026-02-28 23:27:44 +08:00
  • b185f84cce Add switchable C RK4 kernel and build toggle CGH0S7 2026-02-28 21:12:19 +08:00
  • 71f6eb7b44 Remove profiling code ianchb 2026-03-02 11:29:48 +08:00
  • 90620c2aec Optimize fdderivs: skip redundant 2nd-order work in 4th-order overlap CGH0S7 2026-03-02 03:21:21 +08:00
  • f561522d89 prolong3:提升cache命中率 jaunatisblue 2026-03-02 10:31:46 +08:00
  • 3f4715b8cc 修改prolong jaunatisblue 2026-03-02 02:01:07 +08:00
  • 710ea8f76b 对prolong3做访存优化 jaunatisblue 2026-03-02 01:16:10 +08:00
  • 5cf891359d Optimize symmetry_bd with stride-based fast paths CGH0S7 2026-03-01 15:50:56 +08:00
  • 222747449a Optimize average2: use DO CONCURRENT loop form CGH0S7 2026-03-01 00:41:32 +08:00
  • 14de4d535e Optimize average2: replace array expression with explicit loops CGH0S7 2026-03-01 00:33:01 +08:00
  • 787295692a Optimize prolong3: hoist bounds check out of inner loop CGH0S7 2026-03-01 00:17:30 +08:00
  • 335f2f23fe Optimize prolong3: replace parity branches with coefficient lookup CGH0S7 2026-02-28 23:59:57 +08:00
  • 7109474a14 Optimize prolong3: precompute coarse index/parity maps CGH0S7 2026-02-28 23:53:30 +08:00
  • 47f91ff46f prolong3:提升cache命中率 jaunatisblue 2026-03-02 10:31:46 +08:00
  • e11363e06e Optimize fdderivs: skip redundant 2nd-order work in 4th-order overlap hxh-omp CGH0S7 2026-03-02 03:21:21 +08:00
  • f70e90f694 prolong3:提升cache命中率 jaunatisblue 2026-03-02 10:31:46 +08:00
  • 75dd5353b0 修改prolong jaunatisblue 2026-03-02 02:01:07 +08:00