|
|
44efb2e08c
|
预赛最终版本v1.0.0: 确定PGO和原负载均衡方案在当前版本造成负优化已经回退
|
2026-03-01 18:04:25 +08:00 |
|
|
|
16013081e0
|
Optimize symmetry_bd with stride-based fast paths
|
2026-03-01 15:50:56 +08:00 |
|
|
|
03416a7b28
|
perf(polint): add uniform-grid fast path for barycentric n=6
|
2026-03-01 13:26:39 +08:00 |
|
|
|
cca3c16c2b
|
perf(polint): add switchable barycentric ordn=6 path
|
2026-03-01 13:20:46 +08:00 |
|
|
|
e5231849ee
|
perf(polin3): switch to lagrange-weight tensor contraction
|
2026-03-01 13:04:33 +08:00 |
|
|
|
a766e49ff0
|
perf(polint): add ordn=6 specialized neville path
|
2026-03-01 12:39:53 +08:00 |
|
|
|
1a518cd3f6
|
Optimize average2: use DO CONCURRENT loop form
|
2026-03-01 00:41:32 +08:00 |
|
|
|
1dc622e516
|
Optimize average2: replace array expression with explicit loops
|
2026-03-01 00:33:01 +08:00 |
|
|
|
3046a0ccde
|
Optimize prolong3: hoist bounds check out of inner loop
|
2026-03-01 00:17:30 +08:00 |
|
|
|
d4ec69c98a
|
Optimize prolong3: replace parity branches with coefficient lookup
|
2026-02-28 23:59:57 +08:00 |
|
|
|
2c0a3055d4
|
Optimize prolong3: precompute coarse index/parity maps
|
2026-02-28 23:53:30 +08:00 |
|
|
|
1eba73acbe
|
先关闭绑核心,发现速度对比:不绑定核心+SCX>绑核心+SCX
|
2026-02-28 23:27:44 +08:00 |
|
|
|
b91cfff301
|
Add switchable C RK4 kernel and build toggle
|
2026-02-28 21:12:19 +08:00 |
|
|
|
e29ca2dca9
|
build: switch allocator option to oneTBB tbbmalloc
|
2026-02-28 17:16:00 +08:00 |
|
|
|
6493101ca0
|
bssn_rhs_c: recompute contracted Gamma terms to remove temp arrays
|
2026-02-28 16:34:23 +08:00 |
|
|
|
169986cde1
|
bssn_rhs_c: compute div_beta on-the-fly to remove temp array
|
2026-02-28 16:25:57 +08:00 |
|
|
|
1fbc213888
|
bssn_rhs_c: remove gxx/gyy/gzz temporaries in favor of dxx/dyy/dzz+1
|
2026-02-28 15:50:52 +08:00 |
|
|
|
6024708a48
|
derivs_c: split low/high stencil regions to reduce branch overhead
|
2026-02-28 15:42:31 +08:00 |
|
|
|
bc457d981e
|
bssn_rhs_c: merge lopsided+kodis with shared symmetry buffer
|
2026-02-28 15:23:01 +08:00 |
|
|
|
51dead090e
|
bssn_rhs_c: 融合最终RHS两循环为一循环,用局部变量传递fij中间值 (Modify 6)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 13:49:45 +08:00 |
|
|
|
34d6922a66
|
fdderivs_c: 全量清零改为只清零边界面,减少无效内存写入
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 13:20:06 +08:00 |
|
|
|
8010ad27ed
|
kodiss_c: 收紧循环范围消除边界无用迭代和分支判断
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 13:04:21 +08:00 |
|
|
|
38e691f013
|
bssn_rhs_c: 融合Christoffel修正+trK_rhs两循环为一循环 (Modify 5)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 12:57:07 +08:00 |
|
|
|
808387aa11
|
bssn_rhs_c: 融合fxx/Gamxa+Gamma_rhs_part2两循环为一循环 (Modify 4)
fxx/fxy/fxz和Gamxa/ya/za保留在局部标量中直接复用于Gamma_rhs part2,减少数组读写
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 11:14:35 +08:00 |
|
|
|
c2b676abf2
|
bssn_rhs_c: 融合A^{ij}升指标+Gamma_rhs_part1两循环为一循环 (Modify 3)
A^{ij}六分量保留在局部标量中直接复用于Gamma_rhs计算,减少Rxx..Ryz数组的额外读取
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 11:02:27 +08:00 |
|
|
|
2c60533501
|
bssn_rhs_c: 融合逆度规+Gamma约束+Christoffel三循环为一循环 (Modify 2)
逆度规计算结果保留在局部标量中直接复用,减少对gupxx..gupzz数组的重复读取,每步加速0.01秒
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-28 10:57:40 +08:00 |
|