Compare commits

...

73 Commits

Author SHA1 Message Date
17109fde9b [TEST]UPSTREAM: Pick some source changes from 48080d0a97
* Sync new folder structure
2026-04-23 20:55:40 +08:00
c185f99ee3 UPSTREAM: Pick source changes from a5b2dd9e3c
Original message: fix bug
2026-04-23 20:18:44 +08:00
4a13a9d37a UPSTREAM: Pick some source changes from 57ec145e59 2026-04-23 20:10:12 +08:00
ac82ebd889 更新精度检查脚本加入图像比对检查 2026-04-15 00:49:46 +08:00
9c31384b2f Add optional BSSN kernel profiling switches 2026-04-13 16:51:06 +08:00
e4e741caa1 Remove dead chi derivative setup in BSSN RHS 2026-04-13 15:55:43 +08:00
65e0f95f40 Localize chi Ricci intermediates in RHS 2026-04-13 15:14:31 +08:00
f9fbf97e64 Elide dead stores in BSSN RHS hot path 2026-04-13 15:10:22 +08:00
968522995b Add fine-grained step timing and trim BH RHS overhead 2026-04-13 14:50:55 +08:00
f3988ac8ca Merge wave and mass extraction interpolation 2026-04-13 13:17:36 +08:00
e4c25eb21f Cache wave extraction angular kernels 2026-04-13 12:40:20 +08:00
4b10519876 Reuse mass integrand across detector radii 2026-04-13 11:55:41 +08:00
3a58273501 Batch constraint norm reductions 2026-04-13 11:48:02 +08:00
5c65cea2f0 Optimize constraint refresh after regrid 2026-04-13 11:39:50 +08:00
8c1f4d8108 迁移C算子的循环融合和临时量消除 2026-03-03 16:20:15 +08:00
d310ef918b bssn_rhs(fortran): migrate C kernel loop-fusion optimizations 2026-03-03 16:20:15 +08:00
b35e1b289f 设置开关关闭内存打印统计 2026-03-03 16:17:47 +08:00
05851b2c59 关闭静态负载 2026-03-03 16:17:47 +08:00
3b39583d67 fix(bssn_rhs) 2026-03-03 16:06:33 +08:00
688bdb6708 Merge pull request 'cjy-dystopia' (#3) from cjy-dystopia into main
Reviewed-on: #3
2026-03-02 21:36:26 +08:00
5070134857 perf(transfer_cached): 将 per-call new/delete 的 req_node/req_is_recv/completed 数组移入 SyncCache 复用
避免 transfer_cached 每次调用分配释放 3 个临时数组,减少堆操作开销。

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-02 21:14:35 +08:00
4012e9d068 perf(RestrictProlong): 用 Restrict_cached/OutBdLow2Hi_cached 替换非缓存版本,Sync_finish 改为渐进式解包
- RestrictProlong/RestrictProlong_aux 中的 Restrict() 和 OutBdLow2Hi() 替换为 _cached 版本,
  复用 gridseg 列表和 MPI 缓冲区,避免每次调用重新分配
- 新增 sync_cache_restrict/sync_cache_outbd 两组 per-level 缓存
- Sync_finish 从 MPI_Waitall 改为 MPI_Waitsome 渐进式解包,降低尾延迟
- AsyncSyncState 扩展 req_node/req_is_recv/pending_recv 字段支持渐进解包

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-02 20:48:38 +08:00
b3c367f15b prolong3 改为先算实际 stencil 窗口;只有窗口触及对称边界时才走全域 symmetry_bd,否则只复制必需窗口。restrict3 同样改成窗口判定,无触边时仅填 ii/jj/kk 必需窗口。 2026-03-02 17:38:56 +08:00
e73911f292 perf(restrict3): shrink X-pass ii sweep to required overlap window
- compute fi_min/fi_max from output i-range and derive ii_lo/ii_hi
 - replace full ii sweep (-1:extf(1)) with windowed sweep in Z/Y precompute passes
 - keep stencil math unchanged; add bounds sanity check for ii window
2026-03-02 17:37:13 +08:00
7543d3e8c7 perf(MPatch): 用空间 bin 索引加速 Interp_Points 的 block 归属查找
- 为 Patch::Interp_Points 三个重载引入 BlockBinIndex(候选筛选 + 全扫回退)
  - 保持原 point-in-block 判定与后续插值/通信流程不变
  - 将逐点线性扫块从 O(N_points*N_blocks) 降为近似 O(N_points*k)
  - 测试:bin 上限如果太大,会引入不必要的索引构建开销。将 bins 上限设为 16。

Co-authored-by: gpt-5.3-codex
2026-03-02 17:37:13 +08:00
42c69fab24 refactor(Parallel): streamline MPI communication by consolidating request handling and memory management 2026-03-02 17:37:13 +08:00
95220a05c8 optimize fdderivs core-region branch elimination for ghost_width=3 2026-03-02 17:33:26 +08:00
466b084a58 fix prolong/restrict index bounds after cherry-pick 12e1f63 2026-03-02 13:59:47 +08:00
61ccef9f97 prolong3: 减少Z-pass 冗余计算 2026-03-02 13:58:52 +08:00
e11363e06e Optimize fdderivs: skip redundant 2nd-order work in 4th-order overlap 2026-03-02 03:21:21 +08:00
f70e90f694 prolong3:提升cache命中率 2026-03-02 03:05:35 +08:00
jaunatisblue
75dd5353b0 修改prolong 2026-03-02 02:25:25 +08:00
jaunatisblue
23a82d063b 对prolong3做访存优化 2026-03-02 02:25:25 +08:00
524d1d1512 Merge pull request 'cjy-dystopia' (#2) from cjy-dystopia into main
Reviewed-on: #2
2026-03-01 19:22:09 +08:00
44efb2e08c 预赛最终版本v1.0.0: 确定PGO和原负载均衡方案在当前版本造成负优化已经回退 2026-03-01 18:04:25 +08:00
16013081e0 Optimize symmetry_bd with stride-based fast paths 2026-03-01 15:50:56 +08:00
03416a7b28 perf(polint): add uniform-grid fast path for barycentric n=6 2026-03-01 13:26:39 +08:00
cca3c16c2b perf(polint): add switchable barycentric ordn=6 path 2026-03-01 13:20:46 +08:00
e5231849ee perf(polin3): switch to lagrange-weight tensor contraction 2026-03-01 13:04:33 +08:00
a766e49ff0 perf(polint): add ordn=6 specialized neville path 2026-03-01 12:39:53 +08:00
1a518cd3f6 Optimize average2: use DO CONCURRENT loop form 2026-03-01 00:41:32 +08:00
1dc622e516 Optimize average2: replace array expression with explicit loops 2026-03-01 00:33:01 +08:00
3046a0ccde Optimize prolong3: hoist bounds check out of inner loop 2026-03-01 00:17:30 +08:00
d4ec69c98a Optimize prolong3: replace parity branches with coefficient lookup 2026-02-28 23:59:57 +08:00
2c0a3055d4 Optimize prolong3: precompute coarse index/parity maps 2026-02-28 23:53:30 +08:00
1eba73acbe 先关闭绑核心,发现速度对比:不绑定核心+SCX>绑核心+SCX 2026-02-28 23:27:44 +08:00
b91cfff301 Add switchable C RK4 kernel and build toggle 2026-02-28 21:12:19 +08:00
e29ca2dca9 build: switch allocator option to oneTBB tbbmalloc 2026-02-28 17:16:00 +08:00
6493101ca0 bssn_rhs_c: recompute contracted Gamma terms to remove temp arrays 2026-02-28 16:34:23 +08:00
169986cde1 bssn_rhs_c: compute div_beta on-the-fly to remove temp array 2026-02-28 16:25:57 +08:00
1fbc213888 bssn_rhs_c: remove gxx/gyy/gzz temporaries in favor of dxx/dyy/dzz+1 2026-02-28 15:50:52 +08:00
6024708a48 derivs_c: split low/high stencil regions to reduce branch overhead 2026-02-28 15:42:31 +08:00
bc457d981e bssn_rhs_c: merge lopsided+kodis with shared symmetry buffer 2026-02-28 15:23:01 +08:00
51dead090e bssn_rhs_c: 融合最终RHS两循环为一循环,用局部变量传递fij中间值 (Modify 6)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 13:49:45 +08:00
34d6922a66 fdderivs_c: 全量清零改为只清零边界面,减少无效内存写入
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 13:20:06 +08:00
8010ad27ed kodiss_c: 收紧循环范围消除边界无用迭代和分支判断
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 13:04:21 +08:00
38e691f013 bssn_rhs_c: 融合Christoffel修正+trK_rhs两循环为一循环 (Modify 5)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 12:57:07 +08:00
808387aa11 bssn_rhs_c: 融合fxx/Gamxa+Gamma_rhs_part2两循环为一循环 (Modify 4)
fxx/fxy/fxz和Gamxa/ya/za保留在局部标量中直接复用于Gamma_rhs part2,减少数组读写

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 11:14:35 +08:00
c2b676abf2 bssn_rhs_c: 融合A^{ij}升指标+Gamma_rhs_part1两循环为一循环 (Modify 3)
A^{ij}六分量保留在局部标量中直接复用于Gamma_rhs计算,减少Rxx..Ryz数组的额外读取

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 11:02:27 +08:00
2c60533501 bssn_rhs_c: 融合逆度规+Gamma约束+Christoffel三循环为一循环 (Modify 2)
逆度规计算结果保留在局部标量中直接复用,减少对gupxx..gupzz数组的重复读取,每步加速0.01秒

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-28 10:57:40 +08:00
318b5254cc 根据组委会邮件要求更新检测脚本,增加对3D向量和三个分量分别检测RMS小于1.0% 2026-02-27 17:38:21 +08:00
3cee05f262 Merge branch 'cjy-oneapi-opus-hotfix' 2026-02-27 15:13:40 +08:00
e0b5e012df 引入 PGO 式两遍编译流程,将 Interp_Points 负载均衡优化合法化
背景:
上一个 commit 中同事实现的热点 block 拆分与 rank 重映射取得了显著
加速效果,但其中硬编码了 heavy ranks (27/28/35/36) 和重映射表,
属于针对特定测例的优化,违反竞赛规则第 6 条(不允许针对参数或测例
的专门优化)。

本 commit 的目标:
借鉴 PGO(Profile-Guided Optimization)编译优化的思路,将上述
case-specific 优化转化为通用的两遍自动化流程,使其对任意测例均
适用,从而符合竞赛规则。

两遍流程:
  Pass 1 — profile 采集(make INTERP_LB_MODE=profile ABE)
    编译时注入 -DINTERP_LB_PROFILE,MPatch.C 中 Interp_Points
    在首次调用时用 MPI_Wtime 计时 + MPI_Gather 汇总各 rank 耗时,
    识别超过均值 2.5 倍的热点 rank,写入 interp_lb_profile.bin。

  中间步骤 — 生成编译时头文件
    python3 gen_interp_lb_header.py 读取 profile.bin,自动计算
    拆分策略和重映射表,生成 interp_lb_profile_data.h,包含:
    - interp_lb_splits[][3]:每个热点 block 的 (block_id, r_left, r_right)
    - interp_lb_remaps[][2]:被挤占邻居 block 的 rank 重映射

  Pass 2 — 优化编译(make INTERP_LB_MODE=optimize ABE)
    编译时注入 -DINTERP_LB_OPTIMIZE,profile 数据以 static const
    数组形式固化进可执行文件(零运行时开销),distribute_optimize
    在 block 创建阶段直接应用拆分和重映射。

具体改动:
- makefile.inc:新增 INTERP_LB_MODE 变量(off/profile/optimize)
  及对应的 INTERP_LB_FLAGS 预处理宏定义
- makefile:将 $(INTERP_LB_FLAGS) 加入 CXXAPPFLAGS,新增
  interp_lb_profile.o 编译目标
- gen_interp_lb_header.py:profile.bin → interp_lb_profile_data.h
  的自动转换脚本
- interp_lb_profile_data.h:自动生成的编译时常量头文件
- interp_lb_profile.bin:profile 采集阶段生成的二进制数据
- AMSS_NCKU_Program.py:构建时自动拷贝 profile.bin 到运行目录
- makefile_and_run.py:默认构建命令切换为 INTERP_LB_MODE=optimize

通用性说明:
整个流程不依赖任何硬编码的 rank 编号或测例参数。对于不同的网格
配置、进程数或物理问题,只需重新执行 Pass 1 采集 profile,即可
自动生成对应的优化方案。这与 PGO 编译优化的理念完全一致——先
profile 再优化,是一种通用的性能优化方法论。
2026-02-27 15:10:22 +08:00
jaunatisblue
6b2464b80c Interp_Points 负载均衡:热点 block 拆分与 rank 重映射
问题背景:
Patch::Interp_Points 在球面插值时存在严重的 MPI 负载不均衡。
通过 MPI_Wtime 计时诊断发现,64 进程中 rank 27/28/35/36 四个进程
承担了绝大部分插值计算(耗时为平均值的 2.6~3.3 倍),导致其余 60
个进程在 MPI 集合通信处空等,成为整体性能瓶颈。

根因分析:
这四个 rank 对应的 block 在物理空间上恰好覆盖了球面提取面
(extraction sphere)的密集插值点区域,而 distribute 函数按均匀
网格体积分配 block-to-rank,未考虑插值点的空间分布不均。

优化方案:
1. 新增 distribute_optimize 函数替代 distribute,使用独立的
   current_block_id 计数器(与 rank 分配解耦)遍历所有 block。

2. 热点 block 拆分(splitHotspotBlock):
   对 block 27/28/35/36 沿 x 轴在中点处二等分,生成左右两个子
   block,分别分配给相邻的两个 rank:
   - block 27 → (rank 26, rank 27)
   - block 28 → (rank 28, rank 29)
   - block 35 → (rank 34, rank 35)
   - block 36 → (rank 36, rank 37)
   子 block 严格复刻原 distribute 的 ghost zone 扩张和物理坐标
   计算逻辑(支持 Vertex/Cell 两种网格模式)。

3. 邻居 rank 重映射(createMappedBlock):
   被占用的邻居 block 需要让出原 rank,重映射到相邻空闲 rank:
   - block 26 → rank 25
   - block 29 → rank 30
   - block 34 → rank 33
   - block 37 → rank 38
   其余 block 保持 block_id == rank 的原始映射。

4. cgh.C 中 compose_cgh 通过预处理宏切换调用 distribute_optimize
   或原始 distribute。

5. MPatch.C 中添加 profile 采集插桩:在 Interp_Points 重载 2 中
   用 MPI_Wtime 计时,MPI_Gather 汇总各 rank 耗时,识别热点 rank
   并写入二进制 profile 文件。

6. 新增 interp_lb_profile.h/C:定义 profile 文件格式(magic、
   version、nprocs、threshold_ratio、heavy_ranks),提供
   write_profile/read_profile/identify_heavy_ranks 接口。

数学等价性:拆分和重映射仅改变 block 的几何划分与 rank 归属,
不修改任何物理方程、差分格式或插值算法,计算结果严格一致。
2026-02-27 15:07:40 +08:00
9c33e16571 增加C算子PGO文件 2026-02-27 11:30:36 +08:00
45b7a43576 补全C算子和Fortran算子的数学差异 2026-02-26 15:48:11 +08:00
dfb79e3e11 Initialize output arrays to zero in fdderivs_c.C and fderivs_c.C 2026-02-26 14:18:31 +08:00
d2c2214fa1 补充TwoPunctureABE专用PGO插桩文件 2026-02-25 23:06:17 +08:00
e157ea3a23 合并 chb-replace:C++ 算子替换 Fortran bssn_rhs,添加回退开关与独立 PGO profdata
- 合并 chb-replace 分支,引入 bssn_rhs_c.C / fderivs_c.C / fdderivs_c.C /
  kodiss_c.C / lopsided_c.C 五个 C++ 算子实现
- 添加 USE_CXX_KERNELS 开关(默认 1),设为 0 可回退到原始 Fortran bssn_rhs.o
- TwoPunctureABE 改用独立的 TwoPunctureABE.profdata 而非 default.profdata

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 22:50:46 +08:00
e6329b013d Merge branch 'cjy-oneapi-opus-hotfix' 2026-02-20 14:18:33 +08:00
2791d2e225 Merge pull request 'PGO updated' (#1) from cjy-oneapi-opus-hotfix into main
Reviewed-on: #1
2026-02-11 19:17:35 +08:00
72ce153e48 Merge cjy-oneapi-opus-hotfix into main 2026-02-11 19:15:12 +08:00
CGH0S7
79af79d471 baseline updated 2026-02-05 19:53:55 +08:00
233 changed files with 194801 additions and 190884 deletions

6
.idea/vcs.xml generated Normal file
View File

@@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="" vcs="Git" />
</component>
</project>

View File

@@ -126,11 +126,6 @@ setup.generate_AMSSNCKU_input()
#inputvalue = input() ## Wait for user input (press Enter) to proceed #inputvalue = input() ## Wait for user input (press Enter) to proceed
#print() #print()
setup.print_puncture_information()
##################################################################
## Generate AMSS-NCKU program input files based on the configured parameters ## Generate AMSS-NCKU program input files based on the configured parameters
print( ) print( )
@@ -270,6 +265,12 @@ if not os.path.exists( ABE_file ):
## Copy the executable ABE (or ABEGPU) into the run directory ## Copy the executable ABE (or ABEGPU) into the run directory
shutil.copy2(ABE_file, output_directory) shutil.copy2(ABE_file, output_directory)
## Copy interp load balance profile if present (for optimize pass)
interp_lb_profile = os.path.join(AMSS_NCKU_source_copy, "interp_lb_profile.bin")
if os.path.exists(interp_lb_profile):
shutil.copy2(interp_lb_profile, output_directory)
print( " Copied interp_lb_profile.bin to run directory " )
########################### ###########################
## If the initial-data method is TwoPuncture, copy the TwoPunctureABE executable to the run directory ## If the initial-data method is TwoPuncture, copy the TwoPunctureABE executable to the run directory
@@ -306,7 +307,7 @@ if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ):
import generate_TwoPuncture_input import generate_TwoPuncture_input
generate_TwoPuncture_input.generate_AMSSNCKU_TwoPuncture_input() generate_TwoPuncture_input.generate_AMSSNCKU_TwoPuncture_input(numerical_grid.puncture_data)
print( ) print( )
print( " The input parfile for the TwoPunctureABE executable has been generated. " ) print( " The input parfile for the TwoPunctureABE executable has been generated. " )
@@ -348,7 +349,7 @@ if (input_data.Initial_Data_Method == "Ansorg-TwoPuncture" ):
import renew_puncture_parameter import renew_puncture_parameter
renew_puncture_parameter.append_AMSSNCKU_BSSN_input(File_directory, output_directory) renew_puncture_parameter.append_AMSSNCKU_BSSN_input(File_directory, output_directory, numerical_grid.puncture_data)
## Generated AMSS-NCKU input filename ## Generated AMSS-NCKU input filename

View File

@@ -1,10 +1,19 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
AMSS-NCKU GW150914 Simulation Regression Test Script AMSS-NCKU GW150914 Simulation Regression Test Script (Comprehensive Version)
Verification Requirements: Verification Requirements:
1. XY-plane trajectory RMS error < 1% (Optimized vs. baseline, max of BH1 and BH2) 1. RMS errors < 1% for:
- 3D Vector Total RMS
- X Component RMS
- Y Component RMS
- Z Component RMS
2. ADM constraint violation < 2 (Grid Level 0) 2. ADM constraint violation < 2 (Grid Level 0)
3. The following figure PDFs must match GW150914-origin exactly after rasterization:
- ADM_Constraint_Grid_Level_0.pdf
- BH_Trajectory_21_XY.pdf
- BH_Trajectory_XY.pdf
The script also reports the percentage of differing pixels for each figure.
RMS Calculation Method: RMS Calculation Method:
- Computes trajectory deviation on the XY plane independently for BH1 and BH2 - Computes trajectory deviation on the XY plane independently for BH1 and BH2
@@ -19,6 +28,10 @@ Reference: GW150914-origin (baseline simulation)
import numpy as np import numpy as np
import sys import sys
import os import os
import shutil
import subprocess
import tempfile
from PIL import Image
# ANSI Color Codes # ANSI Color Codes
class Color: class Color:
@@ -58,78 +71,187 @@ def load_constraint_data(filepath):
return np.array(data) return np.array(data)
def calculate_rms_error(bh_data_ref, bh_data_target): def resolve_figure_dir(path):
"""Resolve the sibling figure directory from an output or figure path."""
normalized = os.path.normpath(path)
if os.path.basename(normalized) == "figure":
return normalized
return os.path.join(os.path.dirname(normalized), "figure")
def render_pdf_to_images(pdf_path, dpi=150):
"""Render a PDF to RGB images using Ghostscript."""
gs_path = shutil.which("gs")
if gs_path is None:
raise RuntimeError("Ghostscript executable 'gs' was not found in PATH")
with tempfile.TemporaryDirectory(prefix="amss_verify_pdf_") as temp_dir:
output_pattern = os.path.join(temp_dir, "page-%03d.ppm")
cmd = [
gs_path,
"-q",
"-dSAFER",
"-dBATCH",
"-dNOPAUSE",
"-sDEVICE=ppmraw",
f"-r{dpi}",
f"-o{output_pattern}",
pdf_path
]
try:
subprocess.run(cmd, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE, text=True)
except subprocess.CalledProcessError as exc:
message = exc.stderr.strip() or str(exc)
raise RuntimeError(f"Failed to render PDF '{pdf_path}': {message}") from exc
ppm_files = sorted(
os.path.join(temp_dir, filename)
for filename in os.listdir(temp_dir)
if filename.endswith(".ppm")
)
if not ppm_files:
raise RuntimeError(f"No rendered pages were produced for '{pdf_path}'")
images = []
for ppm_file in ppm_files:
with Image.open(ppm_file) as img:
images.append(np.array(img.convert("RGB"), dtype=np.uint8))
return images
def compare_rendered_pages(ref_img, target_img):
"""Return (different_pixels, total_pixels) for two rendered RGB pages."""
ref_h, ref_w = ref_img.shape[:2]
tgt_h, tgt_w = target_img.shape[:2]
total_pixels = max(ref_h, tgt_h) * max(ref_w, tgt_w)
if ref_h == tgt_h and ref_w == tgt_w:
different_pixels = int(np.count_nonzero(np.any(ref_img != target_img, axis=2)))
return different_pixels, total_pixels
diff_mask = np.ones((max(ref_h, tgt_h), max(ref_w, tgt_w)), dtype=bool)
overlap_h = min(ref_h, tgt_h)
overlap_w = min(ref_w, tgt_w)
overlap_diff = np.any(ref_img[:overlap_h, :overlap_w] != target_img[:overlap_h, :overlap_w], axis=2)
diff_mask[:overlap_h, :overlap_w] = overlap_diff
different_pixels = int(np.count_nonzero(diff_mask))
return different_pixels, total_pixels
def compare_pdf_images(ref_pdf, target_pdf, dpi=150, threshold_percent=0.001):
"""Compare two PDFs by rasterizing them and counting differing pixels."""
ref_pages = render_pdf_to_images(ref_pdf, dpi=dpi)
target_pages = render_pdf_to_images(target_pdf, dpi=dpi)
total_pixels = 0
different_pixels = 0
max_pages = max(len(ref_pages), len(target_pages))
for page_idx in range(max_pages):
if page_idx < len(ref_pages) and page_idx < len(target_pages):
page_diff, page_total = compare_rendered_pages(ref_pages[page_idx], target_pages[page_idx])
else:
existing_page = ref_pages[page_idx] if page_idx < len(ref_pages) else target_pages[page_idx]
page_total = existing_page.shape[0] * existing_page.shape[1]
page_diff = page_total
total_pixels += page_total
different_pixels += page_diff
diff_percent = (different_pixels / total_pixels * 100.0) if total_pixels else 0.0
return {
"different_pixels": different_pixels,
"total_pixels": total_pixels,
"diff_percent": diff_percent,
"pages_ref": len(ref_pages),
"pages_target": len(target_pages),
"passed": diff_percent < threshold_percent
}
def compare_required_figures(reference_figure_dir, target_figure_dir):
"""Compare the required GW150914 figure PDFs."""
figure_names = [
"ADM_Constraint_Grid_Level_0.pdf",
"BH_Trajectory_21_XY.pdf",
"BH_Trajectory_XY.pdf"
]
results = []
for figure_name in figure_names:
ref_pdf = os.path.join(reference_figure_dir, figure_name)
target_pdf = os.path.join(target_figure_dir, figure_name)
if not os.path.exists(ref_pdf):
raise FileNotFoundError(f"Reference figure not found: {ref_pdf}")
if not os.path.exists(target_pdf):
raise FileNotFoundError(f"Target figure not found: {target_pdf}")
comparison = compare_pdf_images(ref_pdf, target_pdf)
comparison["name"] = figure_name
results.append(comparison)
return results
def calculate_all_rms_errors(bh_data_ref, bh_data_target):
""" """
Calculate trajectory-based RMS error on the XY plane between baseline and optimized simulations. Calculate 3D Vector RMS and component-wise RMS (X, Y, Z) independently.
Uses r = sqrt(x^2 + y^2) as the denominator for all error normalizations.
This function computes the RMS error independently for BH1 and BH2 trajectories, Returns the maximum error between BH1 and BH2 for each category.
then returns the maximum of the two as the final RMS error metric.
For each black hole, the RMS is calculated as:
RMS = sqrt( (1/M) * sum( (Δr_i / r_i^max)^2 ) ) × 100%
where:
Δr_i = sqrt((x_ref,i - x_new,i)^2 + (y_ref,i - y_new,i)^2)
r_i^max = max(sqrt(x_ref,i^2 + y_ref,i^2), sqrt(x_new,i^2 + y_new,i^2))
Args:
bh_data_ref: Reference (baseline) trajectory data
bh_data_target: Target (optimized) trajectory data
Returns:
rms_value: Final RMS error as a percentage (max of BH1 and BH2)
error: Error message if any
""" """
# Align data: truncate to the length of the shorter dataset
M = min(len(bh_data_ref['time']), len(bh_data_target['time'])) M = min(len(bh_data_ref['time']), len(bh_data_target['time']))
if M < 10: if M < 10:
return None, "Insufficient data points for comparison" return None, "Insufficient data points for comparison"
# Extract XY coordinates for both black holes results = {}
x1_ref = bh_data_ref['x1'][:M]
y1_ref = bh_data_ref['y1'][:M]
x2_ref = bh_data_ref['x2'][:M]
y2_ref = bh_data_ref['y2'][:M]
x1_new = bh_data_target['x1'][:M] for bh in ['1', '2']:
y1_new = bh_data_target['y1'][:M] x_r, y_r, z_r = bh_data_ref[f'x{bh}'][:M], bh_data_ref[f'y{bh}'][:M], bh_data_ref[f'z{bh}'][:M]
x2_new = bh_data_target['x2'][:M] x_n, y_n, z_n = bh_data_target[f'x{bh}'][:M], bh_data_target[f'y{bh}'][:M], bh_data_target[f'z{bh}'][:M]
y2_new = bh_data_target['y2'][:M]
# Calculate RMS for BH1 # 核心修改:根据组委会的邮件指示,分母统一使用 r = sqrt(x^2 + y^2)
delta_r1 = np.sqrt((x1_ref - x1_new)**2 + (y1_ref - y1_new)**2) r_ref = np.sqrt(x_r**2 + y_r**2)
r1_ref = np.sqrt(x1_ref**2 + y1_ref**2) r_new = np.sqrt(x_n**2 + y_n**2)
r1_new = np.sqrt(x1_new**2 + y1_new**2) denom_max = np.maximum(r_ref, r_new)
r1_max = np.maximum(r1_ref, r1_new)
# Calculate RMS for BH2 valid = denom_max > 1e-15
delta_r2 = np.sqrt((x2_ref - x2_new)**2 + (y2_ref - y2_new)**2) if np.sum(valid) < 10:
r2_ref = np.sqrt(x2_ref**2 + y2_ref**2) results[f'BH{bh}'] = { '3D_Vector': 0.0, 'X_Component': 0.0, 'Y_Component': 0.0, 'Z_Component': 0.0 }
r2_new = np.sqrt(x2_new**2 + y2_new**2) continue
r2_max = np.maximum(r2_ref, r2_new)
# Avoid division by zero for BH1 def calc_rms(delta):
valid_mask1 = r1_max > 1e-15 # 将对应分量的偏差除以统一的轨道半径分母 denom_max
if np.sum(valid_mask1) < 10: return np.sqrt(np.mean((delta[valid] / denom_max[valid])**2)) * 100
return None, "Insufficient valid data points for BH1"
terms1 = (delta_r1[valid_mask1] / r1_max[valid_mask1])**2 # 1. Total 3D Vector RMS
rms_bh1 = np.sqrt(np.mean(terms1)) * 100 delta_vec = np.sqrt((x_r - x_n)**2 + (y_r - y_n)**2 + (z_r - z_n)**2)
rms_3d = calc_rms(delta_vec)
# Avoid division by zero for BH2 # 2. Component-wise RMS (分离计算各轴,但共用半径分母)
valid_mask2 = r2_max > 1e-15 rms_x = calc_rms(np.abs(x_r - x_n))
if np.sum(valid_mask2) < 10: rms_y = calc_rms(np.abs(y_r - y_n))
return None, "Insufficient valid data points for BH2" rms_z = calc_rms(np.abs(z_r - z_n))
terms2 = (delta_r2[valid_mask2] / r2_max[valid_mask2])**2 results[f'BH{bh}'] = {
rms_bh2 = np.sqrt(np.mean(terms2)) * 100 '3D_Vector': rms_3d,
'X_Component': rms_x,
'Y_Component': rms_y,
'Z_Component': rms_z
}
# Final RMS is the maximum of BH1 and BH2 # 获取 BH1 BH2 中的最大误差
rms_final = max(rms_bh1, rms_bh2) max_rms = {
'3D_Vector': max(results['BH1']['3D_Vector'], results['BH2']['3D_Vector']),
return rms_final, None 'X_Component': max(results['BH1']['X_Component'], results['BH2']['X_Component']),
'Y_Component': max(results['BH1']['Y_Component'], results['BH2']['Y_Component']),
'Z_Component': max(results['BH1']['Z_Component'], results['BH2']['Z_Component'])
}
return max_rms, None
def analyze_constraint_violation(constraint_data, n_levels=9): def analyze_constraint_violation(constraint_data, n_levels=9):
""" """
@@ -155,34 +277,32 @@ def analyze_constraint_violation(constraint_data, n_levels=9):
def print_header(): def print_header():
"""Print report header"""
print("\n" + Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET) print("\n" + Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
print(Color.BOLD + " AMSS-NCKU GW150914 Simulation Regression Test Report" + Color.RESET) print(Color.BOLD + " AMSS-NCKU GW150914 Comprehensive Regression Test" + Color.RESET)
print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET) print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
def print_rms_results(rms_dict, error, threshold=1.0):
def print_rms_results(rms_rel, error, threshold=1.0): print(f"\n{Color.BOLD}1. RMS Error Analysis (Maximums of BH1 & BH2){Color.RESET}")
"""Print RMS error results""" print("-" * 65)
print(f"\n{Color.BOLD}1. RMS Error Analysis (Baseline vs Optimized){Color.RESET}")
print("-" * 45)
if error: if error:
print(f" {Color.RED}Error: {error}{Color.RESET}") print(f" {Color.RED}Error: {error}{Color.RESET}")
return False return False
passed = rms_rel < threshold all_passed = True
print(f" Requirement: < {threshold}%\n")
print(f" RMS relative error: {rms_rel:.4f}%") for key, val in rms_dict.items():
print(f" Requirement: < {threshold}%") passed = val < threshold
print(f" Status: {get_status_text(passed)}") all_passed = all_passed and passed
status = get_status_text(passed)
return passed print(f" {key:15}: {val:8.4f}% | Status: {status}")
return all_passed
def print_constraint_results(results, threshold=2.0): def print_constraint_results(results, threshold=2.0):
"""Print constraint violation results"""
print(f"\n{Color.BOLD}2. ADM Constraint Violation Analysis (Grid Level 0){Color.RESET}") print(f"\n{Color.BOLD}2. ADM Constraint Violation Analysis (Grid Level 0){Color.RESET}")
print("-" * 45) print("-" * 65)
names = ['Ham', 'Px', 'Py', 'Pz', 'Gx', 'Gy', 'Gz'] names = ['Ham', 'Px', 'Py', 'Pz', 'Gx', 'Gy', 'Gz']
for i, name in enumerate(names): for i, name in enumerate(names):
@@ -199,19 +319,45 @@ def print_constraint_results(results, threshold=2.0):
return passed return passed
def print_summary(rms_passed, constraint_passed): def print_figure_results(results, threshold_percent=0.001):
"""Print summary""" print(f"\n{Color.BOLD}3. Figure Pixel Comparison (PDF Rasterization){Color.RESET}")
print("-" * 65)
print(f" Requirement: < {threshold_percent:.3f}% differing pixels\n")
all_passed = True
for result in results:
passed = result["passed"]
all_passed = all_passed and passed
status = get_status_text(passed)
print(f" {result['name']:32}: {result['diff_percent']:10.6f}% | Status: {status}")
if result["pages_ref"] != result["pages_target"]:
print(f" {'':32} pages(ref/target): {result['pages_ref']}/{result['pages_target']}")
return all_passed
def print_figure_error(error_message):
print(f"\n{Color.BOLD}3. Figure Pixel Comparison (PDF Rasterization){Color.RESET}")
print("-" * 65)
print(f" {Color.RED}Error: {error_message}{Color.RESET}")
return False
def print_summary(rms_passed, constraint_passed, figure_passed):
print("\n" + Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET) print("\n" + Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
print(Color.BOLD + "Verification Summary" + Color.RESET) print(Color.BOLD + "Verification Summary" + Color.RESET)
print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET) print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
all_passed = rms_passed and constraint_passed all_passed = rms_passed and constraint_passed and figure_passed
res_rms = get_status_text(rms_passed) res_rms = get_status_text(rms_passed)
res_con = get_status_text(constraint_passed) res_con = get_status_text(constraint_passed)
res_fig = get_status_text(figure_passed)
print(f" [1] RMS trajectory check: {res_rms}") print(f" [1] Comprehensive RMS check: {res_rms}")
print(f" [2] ADM constraint check: {res_con}") print(f" [2] ADM constraint check: {res_con}")
print(f" [3] Figure pixel comparison: {res_fig}")
final_status = f"{Color.GREEN}{Color.BOLD}ALL CHECKS PASSED{Color.RESET}" if all_passed else f"{Color.RED}{Color.BOLD}SOME CHECKS FAILED{Color.RESET}" final_status = f"{Color.GREEN}{Color.BOLD}ALL CHECKS PASSED{Color.RESET}" if all_passed else f"{Color.RED}{Color.BOLD}SOME CHECKS FAILED{Color.RESET}"
print(f"\n Overall result: {final_status}") print(f"\n Overall result: {final_status}")
@@ -219,61 +365,58 @@ def print_summary(rms_passed, constraint_passed):
return all_passed return all_passed
def main(): def main():
# Determine target (optimized) output directory
if len(sys.argv) > 1: if len(sys.argv) > 1:
target_dir = sys.argv[1] target_dir = sys.argv[1]
else: else:
script_dir = os.path.dirname(os.path.abspath(__file__)) script_dir = os.path.dirname(os.path.abspath(__file__))
target_dir = os.path.join(script_dir, "GW150914/AMSS_NCKU_output") target_dir = os.path.join(script_dir, "GW150914/AMSS_NCKU_output")
# Determine reference (baseline) directory
script_dir = os.path.dirname(os.path.abspath(__file__)) script_dir = os.path.dirname(os.path.abspath(__file__))
reference_dir = os.path.join(script_dir, "GW150914-origin/AMSS_NCKU_output") reference_dir = os.path.join(script_dir, "GW150914-origin/AMSS_NCKU_output")
target_figure_dir = resolve_figure_dir(target_dir)
reference_figure_dir = os.path.join(script_dir, "GW150914-origin/figure")
# Data file paths
bh_file_ref = os.path.join(reference_dir, "bssn_BH.dat") bh_file_ref = os.path.join(reference_dir, "bssn_BH.dat")
bh_file_target = os.path.join(target_dir, "bssn_BH.dat") bh_file_target = os.path.join(target_dir, "bssn_BH.dat")
constraint_file = os.path.join(target_dir, "bssn_constraint.dat") constraint_file = os.path.join(target_dir, "bssn_constraint.dat")
# Check if files exist
if not os.path.exists(bh_file_ref): if not os.path.exists(bh_file_ref):
print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Baseline trajectory file not found: {bh_file_ref}") print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Baseline trajectory file not found: {bh_file_ref}")
sys.exit(1) sys.exit(1)
if not os.path.exists(bh_file_target): if not os.path.exists(bh_file_target):
print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Target trajectory file not found: {bh_file_target}") print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Target trajectory file not found: {bh_file_target}")
sys.exit(1) sys.exit(1)
if not os.path.exists(constraint_file): if not os.path.exists(constraint_file):
print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Constraint data file not found: {constraint_file}") print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Constraint data file not found: {constraint_file}")
sys.exit(1) sys.exit(1)
# Print header
print_header() print_header()
print(f"\n{Color.BOLD}Reference (Baseline):{Color.RESET} {Color.BLUE}{reference_dir}{Color.RESET}") print(f"\n{Color.BOLD}Reference (Baseline):{Color.RESET} {Color.BLUE}{reference_dir}{Color.RESET}")
print(f"{Color.BOLD}Target (Optimized): {Color.RESET} {Color.BLUE}{target_dir}{Color.RESET}") print(f"{Color.BOLD}Target (Optimized): {Color.RESET} {Color.BLUE}{target_dir}{Color.RESET}")
print(f"{Color.BOLD}Reference Figures: {Color.RESET} {Color.BLUE}{reference_figure_dir}{Color.RESET}")
print(f"{Color.BOLD}Target Figures: {Color.RESET} {Color.BLUE}{target_figure_dir}{Color.RESET}")
# Load data
bh_data_ref = load_bh_trajectory(bh_file_ref) bh_data_ref = load_bh_trajectory(bh_file_ref)
bh_data_target = load_bh_trajectory(bh_file_target) bh_data_target = load_bh_trajectory(bh_file_target)
constraint_data = load_constraint_data(constraint_file) constraint_data = load_constraint_data(constraint_file)
# Calculate RMS error # Output modified RMS results
rms_rel, error = calculate_rms_error(bh_data_ref, bh_data_target) rms_dict, error = calculate_all_rms_errors(bh_data_ref, bh_data_target)
rms_passed = print_rms_results(rms_rel, error) rms_passed = print_rms_results(rms_dict, error)
# Analyze constraint violation # Output constraint results
constraint_results = analyze_constraint_violation(constraint_data) constraint_results = analyze_constraint_violation(constraint_data)
constraint_passed = print_constraint_results(constraint_results) constraint_passed = print_constraint_results(constraint_results)
# Print summary try:
all_passed = print_summary(rms_passed, constraint_passed) figure_results = compare_required_figures(reference_figure_dir, target_figure_dir)
figure_passed = print_figure_results(figure_results)
except (FileNotFoundError, RuntimeError) as exc:
figure_passed = print_figure_error(str(exc))
# Return exit code all_passed = print_summary(rms_passed, constraint_passed, figure_passed)
sys.exit(0 if all_passed else 1) sys.exit(0 if all_passed else 1)
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@@ -41,6 +41,239 @@ using namespace std;
#include "derivatives.h" #include "derivatives.h"
#include "ricci_gamma.h" #include "ricci_gamma.h"
// Compile-time switch for per-timestep memory usage collection/printing.
// Default is OFF to reduce overhead in production runs.
#ifndef BSSN_ENABLE_MEM_USAGE_LOG
#define BSSN_ENABLE_MEM_USAGE_LOG 0
#endif
#ifndef BSSN_FINE_TIMING
#define BSSN_FINE_TIMING 0
#endif
#ifndef BSSN_FINE_TIMING_EVERY
#define BSSN_FINE_TIMING_EVERY 1
#endif
#ifndef BSSN_FINE_TIMING_TOPN
#define BSSN_FINE_TIMING_TOPN 8
#endif
#ifndef BSSN_KERNEL_FINE_TIMING
#define BSSN_KERNEL_FINE_TIMING 0
#endif
#ifndef BSSN_ENABLE_STDIN_ABORT_POLL
#define BSSN_ENABLE_STDIN_ABORT_POLL 0
#endif
#if BSSN_FINE_TIMING
namespace step_timing
{
enum Bucket
{
TB_ANALYSIS_PSI4 = 0,
TB_ANALYSIS_SURFACE,
TB_ANALYSIS_IO,
TB_BH_PREDICTOR,
TB_PREDICTOR_RHS,
TB_PREDICTOR_SYNC,
TB_BH_CORRECTOR,
TB_CORRECTOR_RHS,
TB_CORRECTOR_SYNC,
TB_STATE_SWAP,
TB_RESTRICT_PROLONG,
TB_CONSTRAINT_OUT,
TB_DUMP_3D,
TB_DUMP_2D,
TB_CHECKPOINT,
TB_REGRID,
TB_COUNT
};
static double local_bucket_seconds[TB_COUNT];
static const char *bucket_labels[TB_COUNT] =
{
"analysis_psi4",
"analysis_surface",
"analysis_io",
"bh_predictor",
"predictor_rhs",
"predictor_sync",
"bh_corrector",
"corrector_rhs",
"corrector_sync",
"state_swap",
"restrict_prolong",
"constraint_out",
"dump_3d",
"dump_2d",
"checkpoint",
"regrid"
};
void reset()
{
for (int i = 0; i < TB_COUNT; i++)
local_bucket_seconds[i] = 0.0;
}
void add(Bucket bucket, double seconds)
{
local_bucket_seconds[int(bucket)] += seconds;
}
void report(int myrank, int nprocs, monitor *TimingMonitor,
int step_index, double phys_time, double step_wall_seconds)
{
double max_bucket_seconds[TB_COUNT];
double avg_bucket_seconds[TB_COUNT];
MPI_Reduce(local_bucket_seconds, max_bucket_seconds, TB_COUNT, MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce(local_bucket_seconds, avg_bucket_seconds, TB_COUNT, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myrank != 0)
return;
for (int i = 0; i < TB_COUNT; i++)
avg_bucket_seconds[i] /= Mymax(1, nprocs);
if (TimingMonitor)
{
double row[2 + 2 * TB_COUNT];
row[0] = double(step_index);
row[1] = step_wall_seconds;
for (int i = 0; i < TB_COUNT; i++)
{
row[2 + i] = max_bucket_seconds[i];
row[2 + TB_COUNT + i] = avg_bucket_seconds[i];
}
TimingMonitor->writefile(phys_time, 2 + 2 * TB_COUNT, row);
}
double residual = step_wall_seconds;
for (int i = 0; i < TB_COUNT; i++)
residual -= max_bucket_seconds[i];
if (residual < 0.0)
residual = 0.0;
int order[TB_COUNT];
for (int i = 0; i < TB_COUNT; i++)
order[i] = i;
for (int i = 0; i < TB_COUNT - 1; i++)
for (int j = i + 1; j < TB_COUNT; j++)
if (max_bucket_seconds[order[j]] > max_bucket_seconds[order[i]])
{
int tmp = order[i];
order[i] = order[j];
order[j] = tmp;
}
ios::fmtflags old_flags = cout.flags();
streamsize old_precision = cout.precision();
cout << " Fine timing hot spots (max rank wall estimate):" << endl;
const int topn = Mymin(BSSN_FINE_TIMING_TOPN, TB_COUNT);
for (int i = 0; i < topn; i++)
{
const int ib = order[i];
const double frac = (step_wall_seconds > 0.0) ? (100.0 * max_bucket_seconds[ib] / step_wall_seconds) : 0.0;
cout << " "
<< setw(20) << left << bucket_labels[ib]
<< " = " << setw(10) << right << setprecision(6) << max_bucket_seconds[ib]
<< " s (" << setw(6) << setprecision(4) << frac << "%)" << endl;
}
if (residual > 1.0e-6)
{
const double frac = (step_wall_seconds > 0.0) ? (100.0 * residual / step_wall_seconds) : 0.0;
cout << " "
<< setw(20) << left << "unprofiled_residual"
<< " = " << setw(10) << right << setprecision(6) << residual
<< " s (" << setw(6) << setprecision(4) << frac << "%)" << endl;
}
cout << endl;
cout.flags(old_flags);
cout.precision(old_precision);
}
}
#define STEP_TIMER_DECL(var_name) const double var_name = MPI_Wtime()
#define STEP_TIMER_ADD(bucket_name, var_name) step_timing::add(step_timing::bucket_name, MPI_Wtime() - (var_name))
#else
#define STEP_TIMER_DECL(var_name)
#define STEP_TIMER_ADD(bucket_name, var_name)
#endif
#if BSSN_KERNEL_FINE_TIMING
namespace rhs_kernel_timing_report
{
void report(int myrank, int nprocs, int step_index, double step_wall_seconds)
{
const int bucket_count = f_bssn_rhs_kernel_timing_bucket_count();
const double *local_bucket_seconds = f_bssn_rhs_kernel_timing_local_seconds();
if (bucket_count <= 0 || !local_bucket_seconds)
return;
double *max_bucket_seconds = new double[bucket_count];
double *avg_bucket_seconds = new double[bucket_count];
int *order = new int[bucket_count];
MPI_Reduce((void *)local_bucket_seconds, max_bucket_seconds, bucket_count, MPI_DOUBLE, MPI_MAX, 0, MPI_COMM_WORLD);
MPI_Reduce((void *)local_bucket_seconds, avg_bucket_seconds, bucket_count, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if (myrank == 0)
{
double kernel_total = 0.0;
for (int i = 0; i < bucket_count; ++i)
{
avg_bucket_seconds[i] /= Mymax(1, nprocs);
order[i] = i;
kernel_total += max_bucket_seconds[i];
}
for (int i = 0; i < bucket_count - 1; ++i)
for (int j = i + 1; j < bucket_count; ++j)
if (max_bucket_seconds[order[j]] > max_bucket_seconds[order[i]])
{
int tmp = order[i];
order[i] = order[j];
order[j] = tmp;
}
ios::fmtflags old_flags = cout.flags();
streamsize old_precision = cout.precision();
const double kernel_frac = (step_wall_seconds > 0.0) ? (100.0 * kernel_total / step_wall_seconds) : 0.0;
cout << " RHS kernel split (max-rank accumulated over step " << step_index << "): total "
<< setprecision(6) << kernel_total << " s (" << setprecision(4)
<< kernel_frac << "% of coarse step)" << endl;
const int topn = Mymin(BSSN_FINE_TIMING_TOPN, bucket_count);
for (int i = 0; i < topn; ++i)
{
const int ib = order[i];
const double frac = (kernel_total > 0.0) ? (100.0 * max_bucket_seconds[ib] / kernel_total) : 0.0;
cout << " "
<< setw(20) << left << f_bssn_rhs_kernel_timing_label(ib)
<< " = " << setw(10) << right << setprecision(6) << max_bucket_seconds[ib]
<< " s (" << setw(6) << setprecision(4) << frac << "% of kernel)" << endl;
}
cout << endl;
cout.flags(old_flags);
cout.precision(old_precision);
}
delete[] max_bucket_seconds;
delete[] avg_bucket_seconds;
delete[] order;
}
}
#endif
//================================================================================================ //================================================================================================
// define bssn_class // define bssn_class
@@ -59,6 +292,7 @@ bssn_class::bssn_class(double Couranti, double StartTimei, double TotalTimei,
xc(0), yc(0), zc(0), xr(0), yr(0), zr(0), trigger(0), dTT(0), dumpid(0), xc(0), yc(0), zc(0), xr(0), yr(0), zr(0), trigger(0), dTT(0), dumpid(0),
#endif #endif
a_lev(a_levi), maxl(maxli), decn(decni), maxrex(maxrexi), drex(drexi), a_lev(a_levi), maxl(maxli), decn(decni), maxrex(maxrexi), drex(drexi),
ConstraintRefreshLevels(0),
CheckPoint(0) CheckPoint(0)
// CheckPoint(0) // CheckPoint(0)
{ {
@@ -101,6 +335,24 @@ bssn_class::bssn_class(double Couranti, double StartTimei, double TotalTimei,
a_stream.str(""); a_stream.str("");
a_stream << setw(15) << "# time Ham Px Py Pz Gx Gy Gz"; a_stream << setw(15) << "# time Ham Px Py Pz Gx Gy Gz";
ConVMonitor = new monitor("bssn_constraint.dat", myrank, a_stream.str()); ConVMonitor = new monitor("bssn_constraint.dat", myrank, a_stream.str());
#if BSSN_FINE_TIMING
a_stream.clear();
a_stream.str("");
a_stream << setw(8) << "# step";
a_stream << setw(14) << "wall";
for (int ib = 0; ib < step_timing::TB_COUNT; ib++)
a_stream << setw(18) << step_timing::bucket_labels[ib];
for (int ib = 0; ib < step_timing::TB_COUNT; ib++)
{
char str_avg[64];
sprintf(str_avg, "avg_%s", step_timing::bucket_labels[ib]);
a_stream << setw(18) << str_avg;
}
TimingMonitor = new monitor("bssn_step_timing.dat", myrank, a_stream.str());
#else
TimingMonitor = 0;
#endif
} }
// setup sphere integration engine // setup sphere integration engine
Waveshell = new surface_integral(Symmetry); Waveshell = new surface_integral(Symmetry);
@@ -696,6 +948,9 @@ void bssn_class::Initialize()
} }
} }
GH = new cgh(0, ngfs, Symmetry, pname, checkrun, ErrorMonitor); GH = new cgh(0, ngfs, Symmetry, pname, checkrun, ErrorMonitor);
ConstraintRefreshLevels = new int[GH->levels];
for (int il = 0; il < GH->levels; il++)
ConstraintRefreshLevels[il] = 0;
if (checkrun) if (checkrun)
CheckPoint->readcheck_cgh(PhysTime, GH, myrank, nprocs, Symmetry); CheckPoint->readcheck_cgh(PhysTime, GH, myrank, nprocs, Symmetry);
else else
@@ -736,6 +991,8 @@ void bssn_class::Initialize()
sync_cache_cor = new Parallel::SyncCache[GH->levels]; sync_cache_cor = new Parallel::SyncCache[GH->levels];
sync_cache_rp_coarse = new Parallel::SyncCache[GH->levels]; sync_cache_rp_coarse = new Parallel::SyncCache[GH->levels];
sync_cache_rp_fine = new Parallel::SyncCache[GH->levels]; sync_cache_rp_fine = new Parallel::SyncCache[GH->levels];
sync_cache_restrict = new Parallel::SyncCache[GH->levels];
sync_cache_outbd = new Parallel::SyncCache[GH->levels];
} }
//================================================================================================ //================================================================================================
@@ -783,6 +1040,8 @@ bssn_class::~bssn_class()
DumpList->clearList(); DumpList->clearList();
ConstraintList->clearList(); ConstraintList->clearList();
delete[] ConstraintRefreshLevels;
delete phio; delete phio;
delete trKo; delete trKo;
delete gxxo; delete gxxo;
@@ -1042,6 +1301,7 @@ bssn_class::~bssn_class()
delete BHMonitor; delete BHMonitor;
delete MAPMonitor; delete MAPMonitor;
delete ConVMonitor; delete ConVMonitor;
delete TimingMonitor;
delete Waveshell; delete Waveshell;
delete CheckPoint; delete CheckPoint;
@@ -2127,8 +2387,10 @@ void bssn_class::Evolve(int Steps)
#endif #endif
*/ */
#if BSSN_ENABLE_MEM_USAGE_LOG
perf bssn_perf; perf bssn_perf;
size_t current_min, current_avg, current_max, peak_min, peak_avg, peak_max; size_t current_min, current_avg, current_max, peak_min, peak_avg, peak_max;
#endif
for (int lev = 0; lev < GH->levels; lev++) for (int lev = 0; lev < GH->levels; lev++)
GH->Lt[lev] = PhysTime; GH->Lt[lev] = PhysTime;
@@ -2137,6 +2399,15 @@ void bssn_class::Evolve(int Steps)
for (int ncount = 1; ncount < Steps + 1; ncount++) for (int ncount = 1; ncount < Steps + 1; ncount++)
{ {
#if BSSN_FINE_TIMING
step_timing::reset();
#endif
#if BSSN_KERNEL_FINE_TIMING
f_bssn_rhs_kernel_timing_reset();
#endif
#if (BSSN_FINE_TIMING || BSSN_KERNEL_FINE_TIMING)
const double step_wall_start = MPI_Wtime();
#endif
// special for large mass ratio consideration // special for large mass ratio consideration
// if(fabs(Porg0[0][0]-Porg0[1][0])+fabs(Porg0[0][1]-Porg0[1][1])+fabs(Porg0[0][2]-Porg0[1][2])<1e-6) // if(fabs(Porg0[0][0]-Porg0[1][0])+fabs(Porg0[0][1]-Porg0[1][1])+fabs(Porg0[0][2]-Porg0[1][2])<1e-6)
// { GH->levels=GH->movls; } // { GH->levels=GH->movls; }
@@ -2163,6 +2434,7 @@ void bssn_class::Evolve(int Steps)
// When LastDump >= DumpTime, output corresponding binary data // When LastDump >= DumpTime, output corresponding binary data
if (LastDump >= DumpTime) if (LastDump >= DumpTime)
{ {
STEP_TIMER_DECL(timer_dump3d);
// misc::tillherecheck("before Dump_Data"); // misc::tillherecheck("before Dump_Data");
for (int lev = 0; lev < GH->levels; lev++) for (int lev = 0; lev < GH->levels; lev++)
@@ -2170,6 +2442,7 @@ void bssn_class::Evolve(int Steps)
#ifdef WithShell #ifdef WithShell
SH->Dump_Data(DumpList, 0, PhysTime, dT_mon); SH->Dump_Data(DumpList, 0, PhysTime, dT_mon);
#endif #endif
STEP_TIMER_ADD(TB_DUMP_3D, timer_dump3d);
LastDump = 0; LastDump = 0;
@@ -2182,10 +2455,12 @@ void bssn_class::Evolve(int Steps)
// When Last2dDump >= d2DumpTime, output corresponding 2D data // When Last2dDump >= d2DumpTime, output corresponding 2D data
if (Last2dDump >= d2DumpTime) if (Last2dDump >= d2DumpTime)
{ {
STEP_TIMER_DECL(timer_dump2d);
// misc::tillherecheck("before 2dDump_Data"); // misc::tillherecheck("before 2dDump_Data");
for (int lev = 0; lev < GH->levels; lev++) for (int lev = 0; lev < GH->levels; lev++)
Parallel::d2Dump_Data(GH->PatL[lev], DumpList, 0, PhysTime, dT_mon); Parallel::d2Dump_Data(GH->PatL[lev], DumpList, 0, PhysTime, dT_mon);
STEP_TIMER_ADD(TB_DUMP_2D, timer_dump2d);
Last2dDump = 0; Last2dDump = 0;
@@ -2210,10 +2485,12 @@ void bssn_class::Evolve(int Steps)
break; break;
#if (REGLEV == 1) #if (REGLEV == 1)
STEP_TIMER_DECL(timer_regrid);
GH->Regrid(Symmetry, BH_num, Porgbr, Porg0, GH->Regrid(Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_mon, StartTime, dT_mon / 2), ErrorMonitor); fgt(PhysTime - dT_mon, StartTime, dT_mon / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
STEP_TIMER_ADD(TB_REGRID, timer_regrid);
#endif #endif
#if (REGLEV == 0 && (PSTR == 1 || PSTR == 2)) #if (REGLEV == 0 && (PSTR == 1 || PSTR == 2))
@@ -2222,6 +2499,7 @@ void bssn_class::Evolve(int Steps)
// fgt(PhysTime-dT_mon,StartTime,dT_mon/2),ErrorMonitor); // fgt(PhysTime-dT_mon,StartTime,dT_mon/2),ErrorMonitor);
#endif #endif
#if BSSN_ENABLE_MEM_USAGE_LOG
// Retrieve memory usage information used during computation; master process prints it // Retrieve memory usage information used during computation; master process prints it
bssn_perf.MemoryUsage(&current_min, &current_avg, &current_max, bssn_perf.MemoryUsage(&current_min, &current_avg, &current_max,
&peak_min, &peak_avg, &peak_max, nprocs); &peak_min, &peak_avg, &peak_max, nprocs);
@@ -2237,6 +2515,7 @@ void bssn_class::Evolve(int Steps)
(double)peak_max / (1024.0 * 1024.0)); (double)peak_max / (1024.0 * 1024.0));
cout << endl; cout << endl;
} }
#endif
// Output puncture positions at each step // Output puncture positions at each step
if (myrank == 0) if (myrank == 0)
@@ -2251,10 +2530,13 @@ void bssn_class::Evolve(int Steps)
<< endl; << endl;
} }
cout << endl; cout << endl;
#if BSSN_ENABLE_STDIN_ABORT_POLL
cout << " If you think the physical evolution time is enough for this simulation, please input 'stop' in the terminal to stop the MPI processes in the next evolution step ! " << endl; cout << " If you think the physical evolution time is enough for this simulation, please input 'stop' in the terminal to stop the MPI processes in the next evolution step ! " << endl;
#endif
// cout << endl; // cout << endl;
} }
#if BSSN_ENABLE_STDIN_ABORT_POLL
//////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////
// If an "abort" command is detected on stdin, terminate MPI processes // If an "abort" command is detected on stdin, terminate MPI processes
//////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////
@@ -2282,10 +2564,12 @@ void bssn_class::Evolve(int Steps)
} }
//////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////
#endif
// When LastCheck >= CheckTime, perform runtime checks and output status data // When LastCheck >= CheckTime, perform runtime checks and output status data
if (LastCheck >= CheckTime) if (LastCheck >= CheckTime)
{ {
STEP_TIMER_DECL(timer_checkpoint);
LastCheck = 0; LastCheck = 0;
CheckPoint->write_Black_Hole_position(BH_num_input, BH_num, Porg0, Porgbr, Mass); CheckPoint->write_Black_Hole_position(BH_num_input, BH_num, Porg0, Porgbr, Mass);
@@ -2294,7 +2578,20 @@ void bssn_class::Evolve(int Steps)
CheckPoint->writecheck_sh(PhysTime, SH); CheckPoint->writecheck_sh(PhysTime, SH);
#endif #endif
CheckPoint->write_bssn(LastDump, Last2dDump, LastAnas); CheckPoint->write_bssn(LastDump, Last2dDump, LastAnas);
STEP_TIMER_ADD(TB_CHECKPOINT, timer_checkpoint);
} }
#if (BSSN_FINE_TIMING || BSSN_KERNEL_FINE_TIMING)
const double step_wall_seconds = MPI_Wtime() - step_wall_start;
#endif
#if BSSN_FINE_TIMING
if (ncount % BSSN_FINE_TIMING_EVERY == 0)
step_timing::report(myrank, nprocs, TimingMonitor, ncount, PhysTime, step_wall_seconds);
#endif
#if BSSN_KERNEL_FINE_TIMING
if (ncount % BSSN_FINE_TIMING_EVERY == 0)
rhs_kernel_timing_report::report(myrank, nprocs, ncount, step_wall_seconds);
#endif
} }
/* /*
#ifdef With_AHF #ifdef With_AHF
@@ -2426,10 +2723,16 @@ void bssn_class::RecursiveStep(int lev)
#endif #endif
#if (REGLEV == 0) #if (REGLEV == 0)
STEP_TIMER_DECL(timer_regrid_onelevel);
if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } {
if (ConstraintRefreshLevels)
ConstraintRefreshLevels[lev] = 1;
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
}
STEP_TIMER_ADD(TB_REGRID, timer_regrid_onelevel);
#endif #endif
} }
@@ -2608,7 +2911,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(GH->mylev, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(GH->mylev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
#endif #endif
} }
@@ -2775,7 +3078,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev + 1, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev + 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_levp1, StartTime, dT_levp1 / 2), ErrorMonitor)) fgt(PhysTime - dT_levp1, StartTime, dT_levp1 / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2790,7 +3093,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2809,7 +3112,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2825,7 +3128,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -3022,6 +3325,7 @@ void bssn_class::Step(int lev, int YN)
// new code 2013-2-15, zjcao // new code 2013-2-15, zjcao
#if (MAPBH == 1) #if (MAPBH == 1)
STEP_TIMER_DECL(timer_bh_predictor);
// for black hole position // for black hole position
if (BH_num > 0 && lev == GH->levels - 1) if (BH_num > 0 && lev == GH->levels - 1)
{ {
@@ -3052,6 +3356,7 @@ void bssn_class::Step(int lev, int YN)
} }
} }
} }
STEP_TIMER_ADD(TB_BH_PREDICTOR, timer_bh_predictor);
// data analysis part // data analysis part
// Warning NOTE: the variables1 are used as temp storege room // Warning NOTE: the variables1 are used as temp storege room
@@ -3074,6 +3379,7 @@ void bssn_class::Step(int lev, int YN)
int ERROR = 0; int ERROR = 0;
MyList<ss_patch> *sPp; MyList<ss_patch> *sPp;
STEP_TIMER_DECL(timer_predictor_rhs);
// Predictor // Predictor
MyList<Patch> *Pp = GH->PatL[lev]; MyList<Patch> *Pp = GH->PatL[lev];
while (Pp) while (Pp)
@@ -3349,6 +3655,9 @@ void bssn_class::Step(int lev, int YN)
} }
#endif #endif
STEP_TIMER_ADD(TB_PREDICTOR_RHS, timer_predictor_rhs);
STEP_TIMER_DECL(timer_predictor_sync);
Parallel::AsyncSyncState async_pre; Parallel::AsyncSyncState async_pre;
Parallel::Sync_start(GH->PatL[lev], SynchList_pre, Symmetry, sync_cache_pre[lev], async_pre); Parallel::Sync_start(GH->PatL[lev], SynchList_pre, Symmetry, sync_cache_pre[lev], async_pre);
@@ -3386,6 +3695,7 @@ void bssn_class::Step(int lev, int YN)
} }
} }
#endif #endif
STEP_TIMER_ADD(TB_PREDICTOR_SYNC, timer_predictor_sync);
#if (MAPBH == 0) #if (MAPBH == 0)
// for black hole position // for black hole position
@@ -3430,6 +3740,7 @@ void bssn_class::Step(int lev, int YN)
// corrector // corrector
for (iter_count = 1; iter_count < 4; iter_count++) for (iter_count = 1; iter_count < 4; iter_count++)
{ {
STEP_TIMER_DECL(timer_corrector_rhs);
// for RK4: t0, t0+dt/2, t0+dt/2, t0+dt; // for RK4: t0, t0+dt/2, t0+dt/2, t0+dt;
if (iter_count == 1 || iter_count == 3) if (iter_count == 1 || iter_count == 3)
TRK4 += dT_lev / 2; TRK4 += dT_lev / 2;
@@ -3709,6 +4020,9 @@ void bssn_class::Step(int lev, int YN)
} }
#endif #endif
STEP_TIMER_ADD(TB_CORRECTOR_RHS, timer_corrector_rhs);
STEP_TIMER_DECL(timer_corrector_sync);
Parallel::AsyncSyncState async_cor; Parallel::AsyncSyncState async_cor;
Parallel::Sync_start(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_cor[lev], async_cor); Parallel::Sync_start(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_cor[lev], async_cor);
@@ -3748,8 +4062,10 @@ void bssn_class::Step(int lev, int YN)
} }
} }
#endif #endif
STEP_TIMER_ADD(TB_CORRECTOR_SYNC, timer_corrector_sync);
#if (MAPBH == 0) #if (MAPBH == 0)
STEP_TIMER_DECL(timer_bh_corrector);
// for black hole position // for black hole position
if (BH_num > 0 && lev == GH->levels - 1) if (BH_num > 0 && lev == GH->levels - 1)
{ {
@@ -3782,11 +4098,13 @@ void bssn_class::Step(int lev, int YN)
} }
} }
} }
STEP_TIMER_ADD(TB_BH_CORRECTOR, timer_bh_corrector);
#endif #endif
// swap time level // swap time level
if (iter_count < 3) if (iter_count < 3)
{ {
STEP_TIMER_DECL(timer_state_swap);
Pp = GH->PatL[lev]; Pp = GH->PatL[lev];
while (Pp) while (Pp)
{ {
@@ -3833,9 +4151,11 @@ void bssn_class::Step(int lev, int YN)
} }
} }
#endif #endif
STEP_TIMER_ADD(TB_STATE_SWAP, timer_state_swap);
} }
} }
#if (RPS == 0) #if (RPS == 0)
STEP_TIMER_DECL(timer_restrict_prolong);
// mesh refinement boundary part // mesh refinement boundary part
RestrictProlong(lev, YN, BB); RestrictProlong(lev, YN, BB);
@@ -3856,6 +4176,7 @@ void bssn_class::Step(int lev, int YN)
} }
#endif #endif
STEP_TIMER_ADD(TB_RESTRICT_PROLONG, timer_restrict_prolong);
#endif #endif
// note the data structure before update // note the data structure before update
// SynchList_cor 1 ----------- // SynchList_cor 1 -----------
@@ -3864,6 +4185,7 @@ void bssn_class::Step(int lev, int YN)
// //
// OldStateList old ----------- // OldStateList old -----------
// update // update
STEP_TIMER_DECL(timer_state_commit);
Pp = GH->PatL[lev]; Pp = GH->PatL[lev];
while (Pp) while (Pp)
{ {
@@ -3920,6 +4242,7 @@ void bssn_class::Step(int lev, int YN)
Porg0[ithBH][2] = Porg1[ithBH][2]; Porg0[ithBH][2] = Porg1[ithBH][2];
} }
} }
STEP_TIMER_ADD(TB_STATE_SWAP, timer_state_commit);
} }
//================================================================================================ //================================================================================================
@@ -4246,7 +4569,9 @@ void bssn_class::Step(int lev, int YN)
} }
} }
#endif #endif
STEP_TIMER_ADD(TB_PREDICTOR_SYNC, timer_predictor_sync);
STEP_TIMER_DECL(timer_bh_predictor);
// for black hole position // for black hole position
if (BH_num > 0 && lev == GH->levels - 1) if (BH_num > 0 && lev == GH->levels - 1)
{ {
@@ -4285,6 +4610,7 @@ void bssn_class::Step(int lev, int YN)
{ {
AnalysisStuff(lev, dT_lev); AnalysisStuff(lev, dT_lev);
} }
STEP_TIMER_ADD(TB_BH_PREDICTOR, timer_bh_predictor);
// corrector // corrector
for (iter_count = 1; iter_count < 3; iter_count++) for (iter_count = 1; iter_count < 3; iter_count++)
{ {
@@ -5755,6 +6081,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
// //
// SynchList_cor old ----------- // SynchList_cor old -----------
{ {
STEP_TIMER_DECL(timer_restrict_prolong);
#if (PSTR == 1 || PSTR == 2) #if (PSTR == 1 || PSTR == 2)
// stringstream a_stream; // stringstream a_stream;
// a_stream.setf(ios::left); // a_stream.setf(ios::left);
@@ -5796,7 +6123,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry); Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry, sync_cache_restrict[lev]);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry);
@@ -5820,7 +6147,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif #endif
@@ -5847,7 +6174,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_restrict[lev]);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry);
@@ -5871,7 +6198,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif #endif
@@ -5897,6 +6224,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
// misc::tillherecheck(GH->Commlev[GH->mylev],GH->start_rank[GH->mylev],a_stream.str()); // misc::tillherecheck(GH->Commlev[GH->mylev],GH->start_rank[GH->mylev],a_stream.str());
#endif #endif
} }
STEP_TIMER_ADD(TB_RESTRICT_PROLONG, timer_restrict_prolong);
} }
//================================================================================================ //================================================================================================
@@ -5916,6 +6244,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
// //
// SynchList_cor old ----------- // SynchList_cor old -----------
{ {
STEP_TIMER_DECL(timer_restrict_prolong);
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"starting RestrictProlong_aux"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"starting RestrictProlong_aux");
if (lev >= GH->levels - 1) if (lev >= GH->levels - 1)
@@ -5940,7 +6269,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
} }
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry); Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry, sync_cache_restrict[lev]);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry);
@@ -5950,7 +6279,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif #endif
@@ -5962,7 +6291,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
else // no time refinement levels and for all same time levels else // no time refinement levels and for all same time levels
{ {
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_restrict[lev]);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry);
@@ -5972,7 +6301,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif #endif
@@ -5984,6 +6313,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
Parallel::Sync_cached(GH->PatL[lev], SL, Symmetry, sync_cache_rp_fine[lev]); Parallel::Sync_cached(GH->PatL[lev], SL, Symmetry, sync_cache_rp_fine[lev]);
} }
STEP_TIMER_ADD(TB_RESTRICT_PROLONG, timer_restrict_prolong);
} }
//================================================================================================ //================================================================================================
@@ -5994,6 +6324,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
void bssn_class::RestrictProlong(int lev, int YN, bool BB) void bssn_class::RestrictProlong(int lev, int YN, bool BB)
{ {
STEP_TIMER_DECL(timer_restrict_prolong);
double dT_lev = dT * pow(0.5, Mymax(lev, trfls)); double dT_lev = dT * pow(0.5, Mymax(lev, trfls));
// we assume for fine // we assume for fine
// SynchList_cor 1 ----------- // SynchList_cor 1 -----------
@@ -6027,7 +6358,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
} }
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, Symmetry); Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, Symmetry, sync_cache_restrict[lev]);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,SynchList_pre,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,SynchList_pre,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, GH->rsul[lev], Symmetry);
@@ -6037,7 +6368,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif #endif
@@ -6051,7 +6382,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
if (myrank == 0) if (myrank == 0)
cout << "===: " << GH->Lt[lev - 1] << "," << GH->Lt[lev] + dT_lev << endl; cout << "===: " << GH->Lt[lev - 1] << "," << GH->Lt[lev] + dT_lev << endl;
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, Symmetry); Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, Symmetry, sync_cache_restrict[lev]);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,StateList,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,StateList,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, GH->rsul[lev], Symmetry);
@@ -6061,7 +6392,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif #endif
@@ -6073,6 +6404,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
Parallel::Sync_cached(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_rp_fine[lev]); Parallel::Sync_cached(GH->PatL[lev], SynchList_cor, Symmetry, sync_cache_rp_fine[lev]);
} }
STEP_TIMER_ADD(TB_RESTRICT_PROLONG, timer_restrict_prolong);
} }
//================================================================================================ //================================================================================================
@@ -6102,7 +6434,7 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif #endif
@@ -6115,7 +6447,7 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
{ {
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry, sync_cache_outbd[lev]);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif #endif
@@ -6823,18 +7155,15 @@ void bssn_class::compute_Porg_rhs(double **BH_PS,double **BH_RHS,var *forx,var *
void bssn_class::compute_Porg_rhs(double **BH_PS, double **BH_RHS, var *forx, var *fory, var *forz, int ilev) void bssn_class::compute_Porg_rhs(double **BH_PS, double **BH_RHS, var *forx, var *fory, var *forz, int ilev)
{ {
const int InList = 3; MyList<var> DG_List_x(forx);
MyList<var> DG_List_y(fory);
MyList<var> DG_List_z(forz);
DG_List_x.next = &DG_List_y;
DG_List_y.next = &DG_List_z;
MyList<var> *DG_List = new MyList<var>(forx); double shellf[3];
DG_List->insert(fory); double pox_buf[3][1];
DG_List->insert(forz); double *pox[3] = {pox_buf[0], pox_buf[1], pox_buf[2]};
double *x1, *y1, *z1;
double *shellf;
shellf = new double[3];
double *pox[3];
for (int i = 0; i < 3; i++)
pox[i] = new double[1];
for (int n = 0; n < BH_num; n++) for (int n = 0; n < BH_num; n++)
{ {
@@ -6845,9 +7174,9 @@ void bssn_class::compute_Porg_rhs(double **BH_PS, double **BH_RHS, var *forx, va
int lev = ilev; int lev = ilev;
#if (PSTR == 0) #if (PSTR == 0)
while (!Parallel::PatList_Interp_Points(GH->PatL[lev], DG_List, 1, pox, shellf, Symmetry)) while (!Parallel::PatList_Interp_Points(GH->PatL[lev], &DG_List_x, 1, pox, shellf, Symmetry))
#elif (PSTR == 1 || PSTR == 2 || PSTR == 3) #elif (PSTR == 1 || PSTR == 2 || PSTR == 3)
while (!Parallel::PatList_Interp_Points(GH->PatL[lev], DG_List, 1, pox, shellf, Symmetry, GH->Commlev[lev])) while (!Parallel::PatList_Interp_Points(GH->PatL[lev], &DG_List_x, 1, pox, shellf, Symmetry, GH->Commlev[lev]))
#endif #endif
{ {
lev--; lev--;
@@ -6856,7 +7185,7 @@ void bssn_class::compute_Porg_rhs(double **BH_PS, double **BH_RHS, var *forx, va
ErrorMonitor->outfile << "fail to find black holes at t = " << PhysTime << endl; ErrorMonitor->outfile << "fail to find black holes at t = " << PhysTime << endl;
for (n = 0; n < BH_num; n++) for (n = 0; n < BH_num; n++)
ErrorMonitor->outfile << "(x,y,z) = (" ErrorMonitor->outfile << "(x,y,z) = ("
<< pox[0][n] << "," << pox[1][n] << "," << pox[2][n] << BH_PS[n][0] << "," << BH_PS[n][1] << "," << BH_PS[n][2]
<< ")" << endl; << ")" << endl;
break; break;
} }
@@ -6869,11 +7198,6 @@ void bssn_class::compute_Porg_rhs(double **BH_PS, double **BH_RHS, var *forx, va
BH_RHS[n][2] = -shellf[2]; BH_RHS[n][2] = -shellf[2];
} }
} }
DG_List->clearList();
delete[] shellf;
for (int i = 0; i < 3; i++)
delete[] pox[i];
} }
#endif #endif
@@ -7096,6 +7420,10 @@ void bssn_class::AnalysisStuff(int lev, double dT_lev)
IP = new double[NN]; IP = new double[NN];
RoutMAP = new double[7]; RoutMAP = new double[7];
double Rex = maxrex; double Rex = maxrex;
bool patch_mass_prepared = false;
#ifdef WithShell
bool shell_mass_prepared = false;
#endif
for (int i = 0; i < decn; i++) for (int i = 0; i < decn; i++)
{ {
#ifdef Point_Psi4 #ifdef Point_Psi4
@@ -7123,7 +7451,8 @@ void bssn_class::AnalysisStuff(int lev, double dT_lev)
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables
RoutMAP, ErrorMonitor); RoutMAP, ErrorMonitor, !patch_mass_prepared);
patch_mass_prepared = true;
} }
else else
{ {
@@ -7131,44 +7460,52 @@ void bssn_class::AnalysisStuff(int lev, double dT_lev)
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables
RoutMAP, ErrorMonitor); RoutMAP, ErrorMonitor, !shell_mass_prepared);
shell_mass_prepared = true;
} }
#else #else
Waveshell->surf_MassPAng(Rex, lev, GH, phi0, trK0, Waveshell->surf_MassPAng(Rex, lev, GH, phi0, trK0,
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables
RoutMAP, ErrorMonitor); RoutMAP, ErrorMonitor, !patch_mass_prepared);
patch_mass_prepared = true;
#endif #endif
#else #else
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before surface integral"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"before surface integral");
#ifdef WithShell #ifdef WithShell
if (lev > 0 || Rex < GH->bbox[0][0][3]) if (lev > 0 || Rex < GH->bbox[0][0][3])
{ {
Waveshell->surf_Wave(Rex, lev, GH, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP, ErrorMonitor); Waveshell->surf_WaveMassPAng(Rex, lev, GH,
Waveshell->surf_MassPAng(Rex, lev, GH, phi0, trK0, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP,
phi0, trK0,
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1,
RoutMAP, ErrorMonitor); RoutMAP, ErrorMonitor, !patch_mass_prepared);
patch_mass_prepared = true;
} }
else else
{ {
Waveshell->surf_Wave(Rex, lev, SH, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP, ErrorMonitor); Waveshell->surf_WaveMassPAng(Rex, lev, SH,
Waveshell->surf_MassPAng(Rex, lev, SH, phi0, trK0, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP,
phi0, trK0,
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1,
RoutMAP, ErrorMonitor); RoutMAP, ErrorMonitor, !shell_mass_prepared);
shell_mass_prepared = true;
} }
#else #else
#if (PSTR == 0) #if (PSTR == 0)
Waveshell->surf_Wave(Rex, lev, GH, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP, ErrorMonitor); Waveshell->surf_WaveMassPAng(Rex, lev, GH,
Waveshell->surf_MassPAng(Rex, lev, GH, phi0, trK0, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP,
phi0, trK0,
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1,
RoutMAP, ErrorMonitor); RoutMAP, ErrorMonitor, !patch_mass_prepared);
patch_mass_prepared = true;
#elif (PSTR == 1 || PSTR == 2) #elif (PSTR == 1 || PSTR == 2)
Waveshell->surf_Wave(Rex, lev, GH, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP, ErrorMonitor, GH->Commlev[lev]); Waveshell->surf_Wave(Rex, lev, GH, Rpsi4, Ipsi4, 2, maxl, NN, RP, IP, ErrorMonitor, GH->Commlev[lev]);
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after surf_Wave"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"after surf_Wave");
@@ -7176,7 +7513,8 @@ void bssn_class::AnalysisStuff(int lev, double dT_lev)
gxx0, gxy0, gxz0, gyy0, gyz0, gzz0, gxx0, gxy0, gxz0, gyy0, gyz0, gzz0,
Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0, Axx0, Axy0, Axz0, Ayy0, Ayz0, Azz0,
Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables Gmx0, Gmy0, Gmz0, Sfx1, Sfy1, Sfz1, // here we can not touch rhs variables, but 1 variables
RoutMAP, ErrorMonitor, GH->Commlev[lev]); RoutMAP, ErrorMonitor, GH->Commlev[lev], !patch_mass_prepared);
patch_mass_prepared = true;
#endif #endif
#endif #endif
// misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"end surface integral"); // misc::tillherecheck(GH->Commlev[lev],GH->start_rank[lev],"end surface integral");
@@ -7249,7 +7587,7 @@ void bssn_class::Constraint_Out()
for (int lev = 0; lev < GH->levels; lev++) for (int lev = 0; lev < GH->levels; lev++)
{ {
// make sure the data consistent for higher levels // make sure the data consistent for higher levels
if (lev > 0) // if the constrait quantities can be reused from the step rhs calculation if (lev > 0 && ConstraintRefreshLevels && ConstraintRefreshLevels[lev]) // only refresh levels whose grid layout changed after evolution
{ {
double TRK4 = PhysTime; double TRK4 = PhysTime;
double ndeps = numepsb; double ndeps = numepsb;
@@ -7403,35 +7741,18 @@ void bssn_class::Constraint_Out()
#if (PSTR == 1 || PSTR == 2) #if (PSTR == 1 || PSTR == 2)
double ConV_h[7]; double ConV_h[7];
#endif #endif
var *ConstraintVars[7] = {Cons_Ham, Cons_Px, Cons_Py, Cons_Pz, Cons_Gx, Cons_Gy, Cons_Gz};
#ifdef WithShell #ifdef WithShell
ConV[0] = SH->L2Norm(Cons_Ham); SH->L2Norm7(ConstraintVars, ConV);
ConV[1] = SH->L2Norm(Cons_Px);
ConV[2] = SH->L2Norm(Cons_Py);
ConV[3] = SH->L2Norm(Cons_Pz);
ConV[4] = SH->L2Norm(Cons_Gx);
ConV[5] = SH->L2Norm(Cons_Gy);
ConV[6] = SH->L2Norm(Cons_Gz);
ConVMonitor->writefile(PhysTime, 7, ConV); ConVMonitor->writefile(PhysTime, 7, ConV);
#endif #endif
for (int levi = 0; levi < GH->levels; levi++) for (int levi = 0; levi < GH->levels; levi++)
{ {
#if (PSTR == 0) #if (PSTR == 0)
ConV[0] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Ham); Parallel::L2Norm7(GH->PatL[levi]->data, ConstraintVars, ConV);
ConV[1] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Px);
ConV[2] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Py);
ConV[3] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Pz);
ConV[4] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Gx);
ConV[5] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Gy);
ConV[6] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Gz);
#elif (PSTR == 1 || PSTR == 2) #elif (PSTR == 1 || PSTR == 2)
ConV[0] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Ham, GH->Commlev[levi]); Parallel::L2Norm7(GH->PatL[levi]->data, ConstraintVars, ConV, GH->Commlev[levi]);
ConV[1] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Px, GH->Commlev[levi]);
ConV[2] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Py, GH->Commlev[levi]);
ConV[3] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Pz, GH->Commlev[levi]);
ConV[4] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Gx, GH->Commlev[levi]);
ConV[5] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Gy, GH->Commlev[levi]);
ConV[6] = Parallel::L2Norm(GH->PatL[levi]->data, Cons_Gz, GH->Commlev[levi]);
// misc::tillherecheck("before collect data to cpu0"); // misc::tillherecheck("before collect data to cpu0");
// MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, op, comm), sendbu and recvbuf must be different // MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, op, comm), sendbu and recvbuf must be different
if (levi > 0) if (levi > 0)
@@ -7462,6 +7783,9 @@ void bssn_class::Constraint_Out()
Interp_Constraint(false); Interp_Constraint(false);
LastConsOut = 0; LastConsOut = 0;
if (ConstraintRefreshLevels)
for (int lev = 0; lev < GH->levels; lev++)
ConstraintRefreshLevels[lev] = 0;
} }
} }

View File

@@ -48,6 +48,7 @@ public:
double StartTime, TotalTime; double StartTime, TotalTime;
double AnasTime, DumpTime, d2DumpTime, CheckTime; double AnasTime, DumpTime, d2DumpTime, CheckTime;
double LastAnas, LastConsOut; double LastAnas, LastConsOut;
int *ConstraintRefreshLevels;
double Courant; double Courant;
double numepss, numepsb, numepsh; double numepss, numepsb, numepsh;
int Symmetry; int Symmetry;
@@ -130,9 +131,11 @@ public:
Parallel::SyncCache *sync_cache_cor; // per-level cache for corrector sync Parallel::SyncCache *sync_cache_cor; // per-level cache for corrector sync
Parallel::SyncCache *sync_cache_rp_coarse; // RestrictProlong sync on PatL[lev-1] Parallel::SyncCache *sync_cache_rp_coarse; // RestrictProlong sync on PatL[lev-1]
Parallel::SyncCache *sync_cache_rp_fine; // RestrictProlong sync on PatL[lev] Parallel::SyncCache *sync_cache_rp_fine; // RestrictProlong sync on PatL[lev]
Parallel::SyncCache *sync_cache_restrict; // cached Restrict in RestrictProlong
Parallel::SyncCache *sync_cache_outbd; // cached OutBdLow2Hi in RestrictProlong
monitor *ErrorMonitor, *Psi4Monitor, *BHMonitor, *MAPMonitor; monitor *ErrorMonitor, *Psi4Monitor, *BHMonitor, *MAPMonitor;
monitor *ConVMonitor; monitor *ConVMonitor, *TimingMonitor;
surface_integral *Waveshell; surface_integral *Waveshell;
checkpoint *CheckPoint; checkpoint *CheckPoint;

View File

@@ -62,6 +62,7 @@
real*8, dimension(ex(1),ex(2),ex(3)),intent(inout) :: Gmx_Res, Gmy_Res, Gmz_Res real*8, dimension(ex(1),ex(2),ex(3)),intent(inout) :: Gmx_Res, Gmy_Res, Gmz_Res
! gont = 0: success; gont = 1: something wrong ! gont = 0: success; gont = 1: something wrong
integer::gont integer::gont
integer :: i,j,k
!~~~~~~> Other variables: !~~~~~~> Other variables:
@@ -85,6 +86,13 @@
real*8,dimension(3) ::SSS,AAS,ASA,SAA,ASS,SAS,SSA real*8,dimension(3) ::SSS,AAS,ASA,SAA,ASS,SAS,SSA
real*8 :: dX, dY, dZ, PI real*8 :: dX, dY, dZ, PI
real*8 :: divb_loc,det_loc
real*8 :: gupxx_loc,gupxy_loc,gupxz_loc,gupyy_loc,gupyz_loc,gupzz_loc
real*8 :: Rxx_loc,Rxy_loc,Rxz_loc,Ryy_loc,Ryz_loc,Rzz_loc
real*8 :: fxx_loc,fxy_loc,fxz_loc
real*8 :: Gamxa_loc,Gamya_loc,Gamza_loc
real*8 :: f_loc,chin_loc
real*8 :: l_fxx,l_fxy,l_fxz,l_fyy,l_fyz,l_fzz,S_loc
real*8, parameter :: ZEO = 0.d0,ONE = 1.D0, TWO = 2.D0, FOUR = 4.D0 real*8, parameter :: ZEO = 0.d0,ONE = 1.D0, TWO = 2.D0, FOUR = 4.D0
real*8, parameter :: EIGHT = 8.D0, HALF = 0.5D0, THR = 3.d0 real*8, parameter :: EIGHT = 8.D0, HALF = 0.5D0, THR = 3.d0
real*8, parameter :: SYM = 1.D0, ANTI= - 1.D0 real*8, parameter :: SYM = 1.D0, ANTI= - 1.D0
@@ -97,7 +105,7 @@
#endif #endif
#if (GAUGE == 6 || GAUGE == 7) #if (GAUGE == 6 || GAUGE == 7)
integer :: BHN,i,j,k integer :: BHN
real*8, dimension(9) :: Porg real*8, dimension(9) :: Porg
real*8, dimension(3) :: Mass real*8, dimension(3) :: Mass
real*8 :: r1,r2,M,A,w1,w2,C1,C2 real*8 :: r1,r2,M,A,w1,w2,C1,C2
@@ -145,22 +153,24 @@
dY = Y(2) - Y(1) dY = Y(2) - Y(1)
dZ = Z(2) - Z(1) dZ = Z(2) - Z(1)
alpn1 = Lap + ONE do k=1,ex(3)
chin1 = chi + ONE do j=1,ex(2)
gxx = dxx + ONE do i=1,ex(1)
gyy = dyy + ONE alpn1(i,j,k) = Lap(i,j,k) + ONE
gzz = dzz + ONE chin1(i,j,k) = chi(i,j,k) + ONE
gxx(i,j,k) = dxx(i,j,k) + ONE
gyy(i,j,k) = dyy(i,j,k) + ONE
gzz(i,j,k) = dzz(i,j,k) + ONE
enddo
enddo
enddo
call fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev) call fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev)
call fderivs(ex,betay,betayx,betayy,betayz,X,Y,Z, SYM,ANTI, SYM,Symmetry,Lev) call fderivs(ex,betay,betayx,betayy,betayz,X,Y,Z, SYM,ANTI, SYM,Symmetry,Lev)
call fderivs(ex,betaz,betazx,betazy,betazz,X,Y,Z, SYM, SYM,ANTI,Symmetry,Lev) call fderivs(ex,betaz,betazx,betazy,betazz,X,Y,Z, SYM, SYM,ANTI,Symmetry,Lev)
div_beta = betaxx + betayy + betazz
call fderivs(ex,chi,chix,chiy,chiz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev) call fderivs(ex,chi,chix,chiy,chiz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev)
chi_rhs = F2o3 *chin1*( alpn1 * trK - div_beta ) !rhs for chi
call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev) call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev)
call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev) call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev)
@@ -168,151 +178,179 @@
call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev) call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev)
call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
gxx_rhs = - TWO * alpn1 * Axx - F2o3 * gxx * div_beta + & do k=1,ex(3)
TWO *( gxx * betaxx + gxy * betayx + gxz * betazx) do j=1,ex(2)
do i=1,ex(1)
divb_loc = betaxx(i,j,k) + betayy(i,j,k) + betazz(i,j,k)
div_beta(i,j,k) = divb_loc
gyy_rhs = - TWO * alpn1 * Ayy - F2o3 * gyy * div_beta + & chi_rhs(i,j,k) = F2o3 * chin1(i,j,k) * (alpn1(i,j,k) * trK(i,j,k) - divb_loc)
TWO *( gxy * betaxy + gyy * betayy + gyz * betazy)
gzz_rhs = - TWO * alpn1 * Azz - F2o3 * gzz * div_beta + & gxx_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Axx(i,j,k) - F2o3 * gxx(i,j,k) * divb_loc + &
TWO *( gxz * betaxz + gyz * betayz + gzz * betazz) TWO * ( gxx(i,j,k) * betaxx(i,j,k) + gxy(i,j,k) * betayx(i,j,k) + gxz(i,j,k) * betazx(i,j,k) )
gxy_rhs = - TWO * alpn1 * Axy + F1o3 * gxy * div_beta + & gyy_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Ayy(i,j,k) - F2o3 * gyy(i,j,k) * divb_loc + &
gxx * betaxy + gxz * betazy + & TWO * ( gxy(i,j,k) * betaxy(i,j,k) + gyy(i,j,k) * betayy(i,j,k) + gyz(i,j,k) * betazy(i,j,k) )
gyy * betayx + gyz * betazx &
- gxy * betazz
gyz_rhs = - TWO * alpn1 * Ayz + F1o3 * gyz * div_beta + & gzz_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Azz(i,j,k) - F2o3 * gzz(i,j,k) * divb_loc + &
gxy * betaxz + gyy * betayz + & TWO * ( gxz(i,j,k) * betaxz(i,j,k) + gyz(i,j,k) * betayz(i,j,k) + gzz(i,j,k) * betazz(i,j,k) )
gxz * betaxy + gzz * betazy &
- gyz * betaxx
gxz_rhs = - TWO * alpn1 * Axz + F1o3 * gxz * div_beta + & gxy_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Axy(i,j,k) + F1o3 * gxy(i,j,k) * divb_loc + &
gxx * betaxz + gxy * betayz + & gxx(i,j,k) * betaxy(i,j,k) + gxz(i,j,k) * betazy(i,j,k) + gyy(i,j,k) * betayx(i,j,k) + &
gyz * betayx + gzz * betazx & gyz(i,j,k) * betazx(i,j,k) - gxy(i,j,k) * betazz(i,j,k)
- gxz * betayy !rhs for gij
! invert tilted metric gyz_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Ayz(i,j,k) + F1o3 * gyz(i,j,k) * divb_loc + &
gupzz = gxx * gyy * gzz + gxy * gyz * gxz + gxz * gxy * gyz - & gxy(i,j,k) * betaxz(i,j,k) + gyy(i,j,k) * betayz(i,j,k) + gxz(i,j,k) * betaxy(i,j,k) + &
gxz * gyy * gxz - gxy * gxy * gzz - gxx * gyz * gyz gzz(i,j,k) * betazy(i,j,k) - gyz(i,j,k) * betaxx(i,j,k)
gupxx = ( gyy * gzz - gyz * gyz ) / gupzz
gupxy = - ( gxy * gzz - gyz * gxz ) / gupzz gxz_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Axz(i,j,k) + F1o3 * gxz(i,j,k) * divb_loc + &
gupxz = ( gxy * gyz - gyy * gxz ) / gupzz gxx(i,j,k) * betaxz(i,j,k) + gxy(i,j,k) * betayz(i,j,k) + gyz(i,j,k) * betayx(i,j,k) + &
gupyy = ( gxx * gzz - gxz * gxz ) / gupzz gzz(i,j,k) * betazx(i,j,k) - gxz(i,j,k) * betayy(i,j,k)
gupyz = - ( gxx * gyz - gxy * gxz ) / gupzz
gupzz = ( gxx * gyy - gxy * gxy ) / gupzz det_loc = gxx(i,j,k) * gyy(i,j,k) * gzz(i,j,k) + gxy(i,j,k) * gyz(i,j,k) * gxz(i,j,k) + &
gxz(i,j,k) * gxy(i,j,k) * gyz(i,j,k) - gxz(i,j,k) * gyy(i,j,k) * gxz(i,j,k) - &
gxy(i,j,k) * gxy(i,j,k) * gzz(i,j,k) - gxx(i,j,k) * gyz(i,j,k) * gyz(i,j,k)
gupxx_loc = ( gyy(i,j,k) * gzz(i,j,k) - gyz(i,j,k) * gyz(i,j,k) ) / det_loc
gupxy_loc = - ( gxy(i,j,k) * gzz(i,j,k) - gyz(i,j,k) * gxz(i,j,k) ) / det_loc
gupxz_loc = ( gxy(i,j,k) * gyz(i,j,k) - gyy(i,j,k) * gxz(i,j,k) ) / det_loc
gupyy_loc = ( gxx(i,j,k) * gzz(i,j,k) - gxz(i,j,k) * gxz(i,j,k) ) / det_loc
gupyz_loc = - ( gxx(i,j,k) * gyz(i,j,k) - gxy(i,j,k) * gxz(i,j,k) ) / det_loc
gupzz_loc = ( gxx(i,j,k) * gyy(i,j,k) - gxy(i,j,k) * gxy(i,j,k) ) / det_loc
gupxx(i,j,k) = gupxx_loc
gupxy(i,j,k) = gupxy_loc
gupxz(i,j,k) = gupxz_loc
gupyy(i,j,k) = gupyy_loc
gupyz(i,j,k) = gupyz_loc
gupzz(i,j,k) = gupzz_loc
if(co == 0)then if(co == 0)then
! Gam^i_Res = Gam^i + gup^ij_,j Gmx_Res(i,j,k) = Gamx(i,j,k) - ( &
Gmx_Res = Gamx - (gupxx*(gupxx*gxxx+gupxy*gxyx+gupxz*gxzx)& gupxx_loc*(gupxx_loc*gxxx(i,j,k)+gupxy_loc*gxyx(i,j,k)+gupxz_loc*gxzx(i,j,k)) + &
+gupxy*(gupxx*gxyx+gupxy*gyyx+gupxz*gyzx)& gupxy_loc*(gupxx_loc*gxyx(i,j,k)+gupxy_loc*gyyx(i,j,k)+gupxz_loc*gyzx(i,j,k)) + &
+gupxz*(gupxx*gxzx+gupxy*gyzx+gupxz*gzzx)& gupxz_loc*(gupxx_loc*gxzx(i,j,k)+gupxy_loc*gyzx(i,j,k)+gupxz_loc*gzzx(i,j,k)) + &
+gupxx*(gupxy*gxxy+gupyy*gxyy+gupyz*gxzy)& gupxx_loc*(gupxy_loc*gxxy(i,j,k)+gupyy_loc*gxyy(i,j,k)+gupyz_loc*gxzy(i,j,k)) + &
+gupxy*(gupxy*gxyy+gupyy*gyyy+gupyz*gyzy)& gupxy_loc*(gupxy_loc*gxyy(i,j,k)+gupyy_loc*gyyy(i,j,k)+gupyz_loc*gyzy(i,j,k)) + &
+gupxz*(gupxy*gxzy+gupyy*gyzy+gupyz*gzzy)& gupxz_loc*(gupxy_loc*gxzy(i,j,k)+gupyy_loc*gyzy(i,j,k)+gupyz_loc*gzzy(i,j,k)) + &
+gupxx*(gupxz*gxxz+gupyz*gxyz+gupzz*gxzz)& gupxx_loc*(gupxz_loc*gxxz(i,j,k)+gupyz_loc*gxyz(i,j,k)+gupzz_loc*gxzz(i,j,k)) + &
+gupxy*(gupxz*gxyz+gupyz*gyyz+gupzz*gyzz)& gupxy_loc*(gupxz_loc*gxyz(i,j,k)+gupyz_loc*gyyz(i,j,k)+gupzz_loc*gyzz(i,j,k)) + &
+gupxz*(gupxz*gxzz+gupyz*gyzz+gupzz*gzzz)) gupxz_loc*(gupxz_loc*gxzz(i,j,k)+gupyz_loc*gyzz(i,j,k)+gupzz_loc*gzzz(i,j,k)))
Gmy_Res = Gamy - (gupxx*(gupxy*gxxx+gupyy*gxyx+gupyz*gxzx)& Gmy_Res(i,j,k) = Gamy(i,j,k) - ( &
+gupxy*(gupxy*gxyx+gupyy*gyyx+gupyz*gyzx)& gupxx_loc*(gupxy_loc*gxxx(i,j,k)+gupyy_loc*gxyx(i,j,k)+gupyz_loc*gxzx(i,j,k)) + &
+gupxz*(gupxy*gxzx+gupyy*gyzx+gupyz*gzzx)& gupxy_loc*(gupxy_loc*gxyx(i,j,k)+gupyy_loc*gyyx(i,j,k)+gupyz_loc*gyzx(i,j,k)) + &
+gupxy*(gupxy*gxxy+gupyy*gxyy+gupyz*gxzy)& gupxz_loc*(gupxy_loc*gxzx(i,j,k)+gupyy_loc*gyzx(i,j,k)+gupyz_loc*gzzx(i,j,k)) + &
+gupyy*(gupxy*gxyy+gupyy*gyyy+gupyz*gyzy)& gupxy_loc*(gupxy_loc*gxxy(i,j,k)+gupyy_loc*gxyy(i,j,k)+gupyz_loc*gxzy(i,j,k)) + &
+gupyz*(gupxy*gxzy+gupyy*gyzy+gupyz*gzzy)& gupyy_loc*(gupxy_loc*gxyy(i,j,k)+gupyy_loc*gyyy(i,j,k)+gupyz_loc*gyzy(i,j,k)) + &
+gupxy*(gupxz*gxxz+gupyz*gxyz+gupzz*gxzz)& gupyz_loc*(gupxy_loc*gxzy(i,j,k)+gupyy_loc*gyzy(i,j,k)+gupyz_loc*gzzy(i,j,k)) + &
+gupyy*(gupxz*gxyz+gupyz*gyyz+gupzz*gyzz)& gupxy_loc*(gupxz_loc*gxxz(i,j,k)+gupyz_loc*gxyz(i,j,k)+gupzz_loc*gxzz(i,j,k)) + &
+gupyz*(gupxz*gxzz+gupyz*gyzz+gupzz*gzzz)) gupyy_loc*(gupxz_loc*gxyz(i,j,k)+gupyz_loc*gyyz(i,j,k)+gupzz_loc*gyzz(i,j,k)) + &
Gmz_Res = Gamz - (gupxx*(gupxz*gxxx+gupyz*gxyx+gupzz*gxzx)& gupyz_loc*(gupxz_loc*gxzz(i,j,k)+gupyz_loc*gyzz(i,j,k)+gupzz_loc*gzzz(i,j,k)))
+gupxy*(gupxz*gxyx+gupyz*gyyx+gupzz*gyzx)& Gmz_Res(i,j,k) = Gamz(i,j,k) - ( &
+gupxz*(gupxz*gxzx+gupyz*gyzx+gupzz*gzzx)& gupxx_loc*(gupxz_loc*gxxx(i,j,k)+gupyz_loc*gxyx(i,j,k)+gupzz_loc*gxzx(i,j,k)) + &
+gupxy*(gupxz*gxxy+gupyz*gxyy+gupzz*gxzy)& gupxy_loc*(gupxz_loc*gxyx(i,j,k)+gupyz_loc*gyyx(i,j,k)+gupzz_loc*gyzx(i,j,k)) + &
+gupyy*(gupxz*gxyy+gupyz*gyyy+gupzz*gyzy)& gupxz_loc*(gupxz_loc*gxzx(i,j,k)+gupyz_loc*gyzx(i,j,k)+gupzz_loc*gzzx(i,j,k)) + &
+gupyz*(gupxz*gxzy+gupyz*gyzy+gupzz*gzzy)& gupxy_loc*(gupxz_loc*gxxy(i,j,k)+gupyz_loc*gxyy(i,j,k)+gupzz_loc*gxzy(i,j,k)) + &
+gupxz*(gupxz*gxxz+gupyz*gxyz+gupzz*gxzz)& gupyy_loc*(gupxz_loc*gxyy(i,j,k)+gupyz_loc*gyyy(i,j,k)+gupzz_loc*gyzy(i,j,k)) + &
+gupyz*(gupxz*gxyz+gupyz*gyyz+gupzz*gyzz)& gupyz_loc*(gupxz_loc*gxzy(i,j,k)+gupyz_loc*gyzy(i,j,k)+gupzz_loc*gzzy(i,j,k)) + &
+gupzz*(gupxz*gxzz+gupyz*gyzz+gupzz*gzzz)) gupxz_loc*(gupxz_loc*gxxz(i,j,k)+gupyz_loc*gxyz(i,j,k)+gupzz_loc*gxzz(i,j,k)) + &
gupyz_loc*(gupxz_loc*gxyz(i,j,k)+gupyz_loc*gyyz(i,j,k)+gupzz_loc*gyzz(i,j,k)) + &
gupzz_loc*(gupxz_loc*gxzz(i,j,k)+gupyz_loc*gyzz(i,j,k)+gupzz_loc*gzzz(i,j,k)))
endif endif
! second kind of connection Gamxxx(i,j,k)=HALF*( gupxx_loc*gxxx(i,j,k) + gupxy_loc*(TWO*gxyx(i,j,k) - gxxy(i,j,k)) + gupxz_loc*(TWO*gxzx(i,j,k) - gxxz(i,j,k)))
Gamxxx =HALF*( gupxx*gxxx + gupxy*(TWO*gxyx - gxxy ) + gupxz*(TWO*gxzx - gxxz )) Gamyxx(i,j,k)=HALF*( gupxy_loc*gxxx(i,j,k) + gupyy_loc*(TWO*gxyx(i,j,k) - gxxy(i,j,k)) + gupyz_loc*(TWO*gxzx(i,j,k) - gxxz(i,j,k)))
Gamyxx =HALF*( gupxy*gxxx + gupyy*(TWO*gxyx - gxxy ) + gupyz*(TWO*gxzx - gxxz )) Gamzxx(i,j,k)=HALF*( gupxz_loc*gxxx(i,j,k) + gupyz_loc*(TWO*gxyx(i,j,k) - gxxy(i,j,k)) + gupzz_loc*(TWO*gxzx(i,j,k) - gxxz(i,j,k)))
Gamzxx =HALF*( gupxz*gxxx + gupyz*(TWO*gxyx - gxxy ) + gupzz*(TWO*gxzx - gxxz ))
Gamxyy =HALF*( gupxx*(TWO*gxyy - gyyx ) + gupxy*gyyy + gupxz*(TWO*gyzy - gyyz )) Gamxyy(i,j,k)=HALF*( gupxx_loc*(TWO*gxyy(i,j,k) - gyyx(i,j,k)) + gupxy_loc*gyyy(i,j,k) + gupxz_loc*(TWO*gyzy(i,j,k) - gyyz(i,j,k)))
Gamyyy =HALF*( gupxy*(TWO*gxyy - gyyx ) + gupyy*gyyy + gupyz*(TWO*gyzy - gyyz )) Gamyyy(i,j,k)=HALF*( gupxy_loc*(TWO*gxyy(i,j,k) - gyyx(i,j,k)) + gupyy_loc*gyyy(i,j,k) + gupyz_loc*(TWO*gyzy(i,j,k) - gyyz(i,j,k)))
Gamzyy =HALF*( gupxz*(TWO*gxyy - gyyx ) + gupyz*gyyy + gupzz*(TWO*gyzy - gyyz )) Gamzyy(i,j,k)=HALF*( gupxz_loc*(TWO*gxyy(i,j,k) - gyyx(i,j,k)) + gupyz_loc*gyyy(i,j,k) + gupzz_loc*(TWO*gyzy(i,j,k) - gyyz(i,j,k)))
Gamxzz =HALF*( gupxx*(TWO*gxzz - gzzx ) + gupxy*(TWO*gyzz - gzzy ) + gupxz*gzzz) Gamxzz(i,j,k)=HALF*( gupxx_loc*(TWO*gxzz(i,j,k) - gzzx(i,j,k)) + gupxy_loc*(TWO*gyzz(i,j,k) - gzzy(i,j,k)) + gupxz_loc*gzzz(i,j,k))
Gamyzz =HALF*( gupxy*(TWO*gxzz - gzzx ) + gupyy*(TWO*gyzz - gzzy ) + gupyz*gzzz) Gamyzz(i,j,k)=HALF*( gupxy_loc*(TWO*gxzz(i,j,k) - gzzx(i,j,k)) + gupyy_loc*(TWO*gyzz(i,j,k) - gzzy(i,j,k)) + gupyz_loc*gzzz(i,j,k))
Gamzzz =HALF*( gupxz*(TWO*gxzz - gzzx ) + gupyz*(TWO*gyzz - gzzy ) + gupzz*gzzz) Gamzzz(i,j,k)=HALF*( gupxz_loc*(TWO*gxzz(i,j,k) - gzzx(i,j,k)) + gupyz_loc*(TWO*gyzz(i,j,k) - gzzy(i,j,k)) + gupzz_loc*gzzz(i,j,k))
Gamxxy =HALF*( gupxx*gxxy + gupxy*gyyx + gupxz*( gxzy + gyzx - gxyz ) ) Gamxxy(i,j,k)=HALF*( gupxx_loc*gxxy(i,j,k) + gupxy_loc*gyyx(i,j,k) + gupxz_loc*(gxzy(i,j,k) + gyzx(i,j,k) - gxyz(i,j,k)) )
Gamyxy =HALF*( gupxy*gxxy + gupyy*gyyx + gupyz*( gxzy + gyzx - gxyz ) ) Gamyxy(i,j,k)=HALF*( gupxy_loc*gxxy(i,j,k) + gupyy_loc*gyyx(i,j,k) + gupyz_loc*(gxzy(i,j,k) + gyzx(i,j,k) - gxyz(i,j,k)) )
Gamzxy =HALF*( gupxz*gxxy + gupyz*gyyx + gupzz*( gxzy + gyzx - gxyz ) ) Gamzxy(i,j,k)=HALF*( gupxz_loc*gxxy(i,j,k) + gupyz_loc*gyyx(i,j,k) + gupzz_loc*(gxzy(i,j,k) + gyzx(i,j,k) - gxyz(i,j,k)) )
Gamxxz =HALF*( gupxx*gxxz + gupxy*( gxyz + gyzx - gxzy ) + gupxz*gzzx ) Gamxxz(i,j,k)=HALF*( gupxx_loc*gxxz(i,j,k) + gupxy_loc*(gxyz(i,j,k) + gyzx(i,j,k) - gxzy(i,j,k)) + gupxz_loc*gzzx(i,j,k) )
Gamyxz =HALF*( gupxy*gxxz + gupyy*( gxyz + gyzx - gxzy ) + gupyz*gzzx ) Gamyxz(i,j,k)=HALF*( gupxy_loc*gxxz(i,j,k) + gupyy_loc*(gxyz(i,j,k) + gyzx(i,j,k) - gxzy(i,j,k)) + gupyz_loc*gzzx(i,j,k) )
Gamzxz =HALF*( gupxz*gxxz + gupyz*( gxyz + gyzx - gxzy ) + gupzz*gzzx ) Gamzxz(i,j,k)=HALF*( gupxz_loc*gxxz(i,j,k) + gupyz_loc*(gxyz(i,j,k) + gyzx(i,j,k) - gxzy(i,j,k)) + gupzz_loc*gzzx(i,j,k) )
Gamxyz =HALF*( gupxx*( gxyz + gxzy - gyzx ) + gupxy*gyyz + gupxz*gzzy ) Gamxyz(i,j,k)=HALF*( gupxx_loc*(gxyz(i,j,k) + gxzy(i,j,k) - gyzx(i,j,k)) + gupxy_loc*gyyz(i,j,k) + gupxz_loc*gzzy(i,j,k) )
Gamyyz =HALF*( gupxy*( gxyz + gxzy - gyzx ) + gupyy*gyyz + gupyz*gzzy ) Gamyyz(i,j,k)=HALF*( gupxy_loc*(gxyz(i,j,k) + gxzy(i,j,k) - gyzx(i,j,k)) + gupyy_loc*gyyz(i,j,k) + gupyz_loc*gzzy(i,j,k) )
Gamzyz =HALF*( gupxz*( gxyz + gxzy - gyzx ) + gupyz*gyyz + gupzz*gzzy ) Gamzyz(i,j,k)=HALF*( gupxz_loc*(gxyz(i,j,k) + gxzy(i,j,k) - gyzx(i,j,k)) + gupyz_loc*gyyz(i,j,k) + gupzz_loc*gzzy(i,j,k) )
enddo
enddo
enddo
! Raise indices of \tilde A_{ij} and store in R_ij ! Raise indices of \tilde A_{ij} and store in R_ij
Rxx = gupxx * gupxx * Axx + gupxy * gupxy * Ayy + gupxz * gupxz * Azz + &
TWO*(gupxx * gupxy * Axy + gupxx * gupxz * Axz + gupxy * gupxz * Ayz)
Ryy = gupxy * gupxy * Axx + gupyy * gupyy * Ayy + gupyz * gupyz * Azz + &
TWO*(gupxy * gupyy * Axy + gupxy * gupyz * Axz + gupyy * gupyz * Ayz)
Rzz = gupxz * gupxz * Axx + gupyz * gupyz * Ayy + gupzz * gupzz * Azz + &
TWO*(gupxz * gupyz * Axy + gupxz * gupzz * Axz + gupyz * gupzz * Ayz)
Rxy = gupxx * gupxy * Axx + gupxy * gupyy * Ayy + gupxz * gupyz * Azz + &
(gupxx * gupyy + gupxy * gupxy)* Axy + &
(gupxx * gupyz + gupxz * gupxy)* Axz + &
(gupxy * gupyz + gupxz * gupyy)* Ayz
Rxz = gupxx * gupxz * Axx + gupxy * gupyz * Ayy + gupxz * gupzz * Azz + &
(gupxx * gupyz + gupxy * gupxz)* Axy + &
(gupxx * gupzz + gupxz * gupxz)* Axz + &
(gupxy * gupzz + gupxz * gupyz)* Ayz
Ryz = gupxy * gupxz * Axx + gupyy * gupyz * Ayy + gupyz * gupzz * Azz + &
(gupxy * gupyz + gupyy * gupxz)* Axy + &
(gupxy * gupzz + gupyz * gupxz)* Axz + &
(gupyy * gupzz + gupyz * gupyz)* Ayz
! Right hand side for Gam^i without shift terms... ! Right hand side for Gam^i without shift terms...
call fderivs(ex,Lap,Lapx,Lapy,Lapz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fderivs(ex,Lap,Lapx,Lapy,Lapz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev)
call fderivs(ex,trK,Kx,Ky,Kz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev) call fderivs(ex,trK,Kx,Ky,Kz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
gupxx_loc = gupxx(i,j,k)
gupxy_loc = gupxy(i,j,k)
gupxz_loc = gupxz(i,j,k)
gupyy_loc = gupyy(i,j,k)
gupyz_loc = gupyz(i,j,k)
gupzz_loc = gupzz(i,j,k)
Gamx_rhs = - TWO * ( Lapx * Rxx + Lapy * Rxy + Lapz * Rxz ) + & Rxx_loc = gupxx_loc * gupxx_loc * Axx(i,j,k) + gupxy_loc * gupxy_loc * Ayy(i,j,k) + gupxz_loc * gupxz_loc * Azz(i,j,k) + &
TWO * alpn1 * ( & TWO * (gupxx_loc * gupxy_loc * Axy(i,j,k) + gupxx_loc * gupxz_loc * Axz(i,j,k) + gupxy_loc * gupxz_loc * Ayz(i,j,k))
-F3o2/chin1 * ( chix * Rxx + chiy * Rxy + chiz * Rxz ) - & Ryy_loc = gupxy_loc * gupxy_loc * Axx(i,j,k) + gupyy_loc * gupyy_loc * Ayy(i,j,k) + gupyz_loc * gupyz_loc * Azz(i,j,k) + &
gupxx * ( F2o3 * Kx + EIGHT * PI * Sx ) - & TWO * (gupxy_loc * gupyy_loc * Axy(i,j,k) + gupxy_loc * gupyz_loc * Axz(i,j,k) + gupyy_loc * gupyz_loc * Ayz(i,j,k))
gupxy * ( F2o3 * Ky + EIGHT * PI * Sy ) - & Rzz_loc = gupxz_loc * gupxz_loc * Axx(i,j,k) + gupyz_loc * gupyz_loc * Ayy(i,j,k) + gupzz_loc * gupzz_loc * Azz(i,j,k) + &
gupxz * ( F2o3 * Kz + EIGHT * PI * Sz ) + & TWO * (gupxz_loc * gupyz_loc * Axy(i,j,k) + gupxz_loc * gupzz_loc * Axz(i,j,k) + gupyz_loc * gupzz_loc * Ayz(i,j,k))
Gamxxx * Rxx + Gamxyy * Ryy + Gamxzz * Rzz + & Rxy_loc = gupxx_loc * gupxy_loc * Axx(i,j,k) + gupxy_loc * gupyy_loc * Ayy(i,j,k) + gupxz_loc * gupyz_loc * Azz(i,j,k) + &
TWO * ( Gamxxy * Rxy + Gamxxz * Rxz + Gamxyz * Ryz ) ) (gupxx_loc * gupyy_loc + gupxy_loc * gupxy_loc) * Axy(i,j,k) + &
(gupxx_loc * gupyz_loc + gupxz_loc * gupxy_loc) * Axz(i,j,k) + &
(gupxy_loc * gupyz_loc + gupxz_loc * gupyy_loc) * Ayz(i,j,k)
Rxz_loc = gupxx_loc * gupxz_loc * Axx(i,j,k) + gupxy_loc * gupyz_loc * Ayy(i,j,k) + gupxz_loc * gupzz_loc * Azz(i,j,k) + &
(gupxx_loc * gupyz_loc + gupxy_loc * gupxz_loc) * Axy(i,j,k) + &
(gupxx_loc * gupzz_loc + gupxz_loc * gupxz_loc) * Axz(i,j,k) + &
(gupxy_loc * gupzz_loc + gupxz_loc * gupyz_loc) * Ayz(i,j,k)
Ryz_loc = gupxy_loc * gupxz_loc * Axx(i,j,k) + gupyy_loc * gupyz_loc * Ayy(i,j,k) + gupyz_loc * gupzz_loc * Azz(i,j,k) + &
(gupxy_loc * gupyz_loc + gupyy_loc * gupxz_loc) * Axy(i,j,k) + &
(gupxy_loc * gupzz_loc + gupyz_loc * gupxz_loc) * Axz(i,j,k) + &
(gupyy_loc * gupzz_loc + gupyz_loc * gupyz_loc) * Ayz(i,j,k)
Rxx(i,j,k) = Rxx_loc
Ryy(i,j,k) = Ryy_loc
Rzz(i,j,k) = Rzz_loc
Rxy(i,j,k) = Rxy_loc
Rxz(i,j,k) = Rxz_loc
Ryz(i,j,k) = Ryz_loc
Gamy_rhs = - TWO * ( Lapx * Rxy + Lapy * Ryy + Lapz * Ryz ) + & Gamx_rhs(i,j,k) = - TWO * (Lapx(i,j,k) * Rxx_loc + Lapy(i,j,k) * Rxy_loc + Lapz(i,j,k) * Rxz_loc) + &
TWO * alpn1 * ( & TWO * alpn1(i,j,k) * ( &
-F3o2/chin1 * ( chix * Rxy + chiy * Ryy + chiz * Ryz ) - & -F3o2/chin1(i,j,k) * (chix(i,j,k) * Rxx_loc + chiy(i,j,k) * Rxy_loc + chiz(i,j,k) * Rxz_loc) - &
gupxy * ( F2o3 * Kx + EIGHT * PI * Sx ) - & gupxx_loc * (F2o3 * Kx(i,j,k) + EIGHT * PI * Sx(i,j,k)) - &
gupyy * ( F2o3 * Ky + EIGHT * PI * Sy ) - & gupxy_loc * (F2o3 * Ky(i,j,k) + EIGHT * PI * Sy(i,j,k)) - &
gupyz * ( F2o3 * Kz + EIGHT * PI * Sz ) + & gupxz_loc * (F2o3 * Kz(i,j,k) + EIGHT * PI * Sz(i,j,k)) + &
Gamyxx * Rxx + Gamyyy * Ryy + Gamyzz * Rzz + & Gamxxx(i,j,k) * Rxx_loc + Gamxyy(i,j,k) * Ryy_loc + Gamxzz(i,j,k) * Rzz_loc + &
TWO * ( Gamyxy * Rxy + Gamyxz * Rxz + Gamyyz * Ryz ) ) TWO * (Gamxxy(i,j,k) * Rxy_loc + Gamxxz(i,j,k) * Rxz_loc + Gamxyz(i,j,k) * Ryz_loc))
Gamz_rhs = - TWO * ( Lapx * Rxz + Lapy * Ryz + Lapz * Rzz ) + & Gamy_rhs(i,j,k) = - TWO * (Lapx(i,j,k) * Rxy_loc + Lapy(i,j,k) * Ryy_loc + Lapz(i,j,k) * Ryz_loc) + &
TWO * alpn1 * ( & TWO * alpn1(i,j,k) * ( &
-F3o2/chin1 * ( chix * Rxz + chiy * Ryz + chiz * Rzz ) - & -F3o2/chin1(i,j,k) * (chix(i,j,k) * Rxy_loc + chiy(i,j,k) * Ryy_loc + chiz(i,j,k) * Ryz_loc) - &
gupxz * ( F2o3 * Kx + EIGHT * PI * Sx ) - & gupxy_loc * (F2o3 * Kx(i,j,k) + EIGHT * PI * Sx(i,j,k)) - &
gupyz * ( F2o3 * Ky + EIGHT * PI * Sy ) - & gupyy_loc * (F2o3 * Ky(i,j,k) + EIGHT * PI * Sy(i,j,k)) - &
gupzz * ( F2o3 * Kz + EIGHT * PI * Sz ) + & gupyz_loc * (F2o3 * Kz(i,j,k) + EIGHT * PI * Sz(i,j,k)) + &
Gamzxx * Rxx + Gamzyy * Ryy + Gamzzz * Rzz + & Gamyxx(i,j,k) * Rxx_loc + Gamyyy(i,j,k) * Ryy_loc + Gamyzz(i,j,k) * Rzz_loc + &
TWO * ( Gamzxy * Rxy + Gamzxz * Rxz + Gamzyz * Ryz ) ) TWO * (Gamyxy(i,j,k) * Rxy_loc + Gamyxz(i,j,k) * Rxz_loc + Gamyyz(i,j,k) * Ryz_loc))
Gamz_rhs(i,j,k) = - TWO * (Lapx(i,j,k) * Rxz_loc + Lapy(i,j,k) * Ryz_loc + Lapz(i,j,k) * Rzz_loc) + &
TWO * alpn1(i,j,k) * ( &
-F3o2/chin1(i,j,k) * (chix(i,j,k) * Rxz_loc + chiy(i,j,k) * Ryz_loc + chiz(i,j,k) * Rzz_loc) - &
gupxz_loc * (F2o3 * Kx(i,j,k) + EIGHT * PI * Sx(i,j,k)) - &
gupyz_loc * (F2o3 * Ky(i,j,k) + EIGHT * PI * Sy(i,j,k)) - &
gupzz_loc * (F2o3 * Kz(i,j,k) + EIGHT * PI * Sz(i,j,k)) + &
Gamzxx(i,j,k) * Rxx_loc + Gamzyy(i,j,k) * Ryy_loc + Gamzzz(i,j,k) * Rzz_loc + &
TWO * (Gamzxy(i,j,k) * Rxy_loc + Gamzxz(i,j,k) * Rxz_loc + Gamzyz(i,j,k) * Ryz_loc))
enddo
enddo
enddo
call fdderivs(ex,betax,gxxx,gxyx,gxzx,gyyx,gyzx,gzzx,& call fdderivs(ex,betax,gxxx,gxyx,gxzx,gyyx,gyzx,gzzx,&
X,Y,Z,ANTI,SYM, SYM ,Symmetry,Lev) X,Y,Z,ANTI,SYM, SYM ,Symmetry,Lev)
@@ -321,38 +359,54 @@
call fdderivs(ex,betaz,gxxz,gxyz,gxzz,gyyz,gyzz,gzzz,& call fdderivs(ex,betaz,gxxz,gxyz,gxzz,gyyz,gyzz,gzzz,&
X,Y,Z,SYM ,SYM, ANTI,Symmetry,Lev) X,Y,Z,SYM ,SYM, ANTI,Symmetry,Lev)
fxx = gxxx + gxyy + gxzz
fxy = gxyx + gyyy + gyzz
fxz = gxzx + gyzy + gzzz
Gamxa = gupxx * Gamxxx + gupyy * Gamxyy + gupzz * Gamxzz + &
TWO*( gupxy * Gamxxy + gupxz * Gamxxz + gupyz * Gamxyz )
Gamya = gupxx * Gamyxx + gupyy * Gamyyy + gupzz * Gamyzz + &
TWO*( gupxy * Gamyxy + gupxz * Gamyxz + gupyz * Gamyyz )
Gamza = gupxx * Gamzxx + gupyy * Gamzyy + gupzz * Gamzzz + &
TWO*( gupxy * Gamzxy + gupxz * Gamzxz + gupyz * Gamzyz )
call fderivs(ex,Gamx,Gamxx,Gamxy,Gamxz,X,Y,Z,ANTI,SYM ,SYM ,Symmetry,Lev) call fderivs(ex,Gamx,Gamxx,Gamxy,Gamxz,X,Y,Z,ANTI,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev) call fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev)
call fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev) call fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
divb_loc = div_beta(i,j,k)
fxx_loc = gxxx(i,j,k) + gxyy(i,j,k) + gxzz(i,j,k)
fxy_loc = gxyx(i,j,k) + gyyy(i,j,k) + gyzz(i,j,k)
fxz_loc = gxzx(i,j,k) + gyzy(i,j,k) + gzzz(i,j,k)
Gamx_rhs = Gamx_rhs + F2o3 * Gamxa * div_beta - & gupxx_loc = gupxx(i,j,k)
Gamxa * betaxx - Gamya * betaxy - Gamza * betaxz + & gupxy_loc = gupxy(i,j,k)
F1o3 * (gupxx * fxx + gupxy * fxy + gupxz * fxz ) + & gupxz_loc = gupxz(i,j,k)
gupxx * gxxx + gupyy * gyyx + gupzz * gzzx + & gupyy_loc = gupyy(i,j,k)
TWO * (gupxy * gxyx + gupxz * gxzx + gupyz * gyzx ) gupyz_loc = gupyz(i,j,k)
gupzz_loc = gupzz(i,j,k)
Gamy_rhs = Gamy_rhs + F2o3 * Gamya * div_beta - & Gamxa_loc = gupxx_loc * Gamxxx(i,j,k) + gupyy_loc * Gamxyy(i,j,k) + gupzz_loc * Gamxzz(i,j,k) + &
Gamxa * betayx - Gamya * betayy - Gamza * betayz + & TWO * (gupxy_loc * Gamxxy(i,j,k) + gupxz_loc * Gamxxz(i,j,k) + gupyz_loc * Gamxyz(i,j,k))
F1o3 * (gupxy * fxx + gupyy * fxy + gupyz * fxz ) + & Gamya_loc = gupxx_loc * Gamyxx(i,j,k) + gupyy_loc * Gamyyy(i,j,k) + gupzz_loc * Gamyzz(i,j,k) + &
gupxx * gxxy + gupyy * gyyy + gupzz * gzzy + & TWO * (gupxy_loc * Gamyxy(i,j,k) + gupxz_loc * Gamyxz(i,j,k) + gupyz_loc * Gamyyz(i,j,k))
TWO * (gupxy * gxyy + gupxz * gxzy + gupyz * gyzy ) Gamza_loc = gupxx_loc * Gamzxx(i,j,k) + gupyy_loc * Gamzyy(i,j,k) + gupzz_loc * Gamzzz(i,j,k) + &
TWO * (gupxy_loc * Gamzxy(i,j,k) + gupxz_loc * Gamzxz(i,j,k) + gupyz_loc * Gamzyz(i,j,k))
Gamxa(i,j,k) = Gamxa_loc
Gamya(i,j,k) = Gamya_loc
Gamza(i,j,k) = Gamza_loc
Gamz_rhs = Gamz_rhs + F2o3 * Gamza * div_beta - & Gamx_rhs(i,j,k) = Gamx_rhs(i,j,k) + F2o3 * Gamxa_loc * divb_loc - &
Gamxa * betazx - Gamya * betazy - Gamza * betazz + & Gamxa_loc * betaxx(i,j,k) - Gamya_loc * betaxy(i,j,k) - Gamza_loc * betaxz(i,j,k) + &
F1o3 * (gupxz * fxx + gupyz * fxy + gupzz * fxz ) + & F1o3 * (gupxx_loc * fxx_loc + gupxy_loc * fxy_loc + gupxz_loc * fxz_loc) + &
gupxx * gxxz + gupyy * gyyz + gupzz * gzzz + & gupxx_loc * gxxx(i,j,k) + gupyy_loc * gyyx(i,j,k) + gupzz_loc * gzzx(i,j,k) + &
TWO * (gupxy * gxyz + gupxz * gxzz + gupyz * gyzz ) !rhs for Gam^i TWO * (gupxy_loc * gxyx(i,j,k) + gupxz_loc * gxzx(i,j,k) + gupyz_loc * gyzx(i,j,k))
Gamy_rhs(i,j,k) = Gamy_rhs(i,j,k) + F2o3 * Gamya_loc * divb_loc - &
Gamxa_loc * betayx(i,j,k) - Gamya_loc * betayy(i,j,k) - Gamza_loc * betayz(i,j,k) + &
F1o3 * (gupxy_loc * fxx_loc + gupyy_loc * fxy_loc + gupyz_loc * fxz_loc) + &
gupxx_loc * gxxy(i,j,k) + gupyy_loc * gyyy(i,j,k) + gupzz_loc * gzzy(i,j,k) + &
TWO * (gupxy_loc * gxyy(i,j,k) + gupxz_loc * gxzy(i,j,k) + gupyz_loc * gyzy(i,j,k))
Gamz_rhs(i,j,k) = Gamz_rhs(i,j,k) + F2o3 * Gamza_loc * divb_loc - &
Gamxa_loc * betazx(i,j,k) - Gamya_loc * betazy(i,j,k) - Gamza_loc * betazz(i,j,k) + &
F1o3 * (gupxz_loc * fxx_loc + gupyz_loc * fxy_loc + gupzz_loc * fxz_loc) + &
gupxx_loc * gxxz(i,j,k) + gupyy_loc * gyyz(i,j,k) + gupzz_loc * gzzz(i,j,k) + &
TWO * (gupxy_loc * gxyz(i,j,k) + gupxz_loc * gxzz(i,j,k) + gupyz_loc * gyzz(i,j,k))
enddo
enddo
enddo
!first kind of connection stored in gij,k !first kind of connection stored in gij,k
gxxx = gxx * Gamxxx + gxy * Gamyxx + gxz * Gamzxx gxxx = gxx * Gamxxx + gxy * Gamyxx + gxz * Gamzxx
@@ -604,189 +658,187 @@
!covariant second derivative of chi respect to tilted metric !covariant second derivative of chi respect to tilted metric
call fdderivs(ex,chi,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fdderivs(ex,chi,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev)
fxx = fxx - Gamxxx * chix - Gamyxx * chiy - Gamzxx * chiz do k=1,ex(3)
fxy = fxy - Gamxxy * chix - Gamyxy * chiy - Gamzxy * chiz do j=1,ex(2)
fxz = fxz - Gamxxz * chix - Gamyxz * chiy - Gamzxz * chiz do i=1,ex(1)
fyy = fyy - Gamxyy * chix - Gamyyy * chiy - Gamzyy * chiz fxx(i,j,k) = fxx(i,j,k) - Gamxxx(i,j,k) * chix(i,j,k) - Gamyxx(i,j,k) * chiy(i,j,k) - Gamzxx(i,j,k) * chiz(i,j,k)
fyz = fyz - Gamxyz * chix - Gamyyz * chiy - Gamzyz * chiz fxy(i,j,k) = fxy(i,j,k) - Gamxxy(i,j,k) * chix(i,j,k) - Gamyxy(i,j,k) * chiy(i,j,k) - Gamzxy(i,j,k) * chiz(i,j,k)
fzz = fzz - Gamxzz * chix - Gamyzz * chiy - Gamzzz * chiz fxz(i,j,k) = fxz(i,j,k) - Gamxxz(i,j,k) * chix(i,j,k) - Gamyxz(i,j,k) * chiy(i,j,k) - Gamzxz(i,j,k) * chiz(i,j,k)
! Store D^l D_l chi - 3/(2*chi) D^l chi D_l chi in f fyy(i,j,k) = fyy(i,j,k) - Gamxyy(i,j,k) * chix(i,j,k) - Gamyyy(i,j,k) * chiy(i,j,k) - Gamzyy(i,j,k) * chiz(i,j,k)
fyz(i,j,k) = fyz(i,j,k) - Gamxyz(i,j,k) * chix(i,j,k) - Gamyyz(i,j,k) * chiy(i,j,k) - Gamzyz(i,j,k) * chiz(i,j,k)
fzz(i,j,k) = fzz(i,j,k) - Gamxzz(i,j,k) * chix(i,j,k) - Gamyzz(i,j,k) * chiy(i,j,k) - Gamzzz(i,j,k) * chiz(i,j,k)
f = gupxx * ( fxx - F3o2/chin1 * chix * chix ) + & chin_loc = chin1(i,j,k)
gupyy * ( fyy - F3o2/chin1 * chiy * chiy ) + & f_loc = gupxx(i,j,k) * (fxx(i,j,k) - F3o2/chin_loc * chix(i,j,k) * chix(i,j,k)) + &
gupzz * ( fzz - F3o2/chin1 * chiz * chiz ) + & gupyy(i,j,k) * (fyy(i,j,k) - F3o2/chin_loc * chiy(i,j,k) * chiy(i,j,k)) + &
TWO * gupxy * ( fxy - F3o2/chin1 * chix * chiy ) + & gupzz(i,j,k) * (fzz(i,j,k) - F3o2/chin_loc * chiz(i,j,k) * chiz(i,j,k)) + &
TWO * gupxz * ( fxz - F3o2/chin1 * chix * chiz ) + & TWO * gupxy(i,j,k) * (fxy(i,j,k) - F3o2/chin_loc * chix(i,j,k) * chiy(i,j,k)) + &
TWO * gupyz * ( fyz - F3o2/chin1 * chiy * chiz ) TWO * gupxz(i,j,k) * (fxz(i,j,k) - F3o2/chin_loc * chix(i,j,k) * chiz(i,j,k)) + &
! Add chi part to Ricci tensor: TWO * gupyz(i,j,k) * (fyz(i,j,k) - F3o2/chin_loc * chiy(i,j,k) * chiz(i,j,k))
f(i,j,k) = f_loc
Rxx = Rxx + (fxx - chix*chix/chin1/TWO + gxx * f)/chin1/TWO Rxx(i,j,k) = Rxx(i,j,k) + (fxx(i,j,k) - chix(i,j,k)*chix(i,j,k)/chin_loc/TWO + gxx(i,j,k) * f_loc)/chin_loc/TWO
Ryy = Ryy + (fyy - chiy*chiy/chin1/TWO + gyy * f)/chin1/TWO Ryy(i,j,k) = Ryy(i,j,k) + (fyy(i,j,k) - chiy(i,j,k)*chiy(i,j,k)/chin_loc/TWO + gyy(i,j,k) * f_loc)/chin_loc/TWO
Rzz = Rzz + (fzz - chiz*chiz/chin1/TWO + gzz * f)/chin1/TWO Rzz(i,j,k) = Rzz(i,j,k) + (fzz(i,j,k) - chiz(i,j,k)*chiz(i,j,k)/chin_loc/TWO + gzz(i,j,k) * f_loc)/chin_loc/TWO
Rxy = Rxy + (fxy - chix*chiy/chin1/TWO + gxy * f)/chin1/TWO Rxy(i,j,k) = Rxy(i,j,k) + (fxy(i,j,k) - chix(i,j,k)*chiy(i,j,k)/chin_loc/TWO + gxy(i,j,k) * f_loc)/chin_loc/TWO
Rxz = Rxz + (fxz - chix*chiz/chin1/TWO + gxz * f)/chin1/TWO Rxz(i,j,k) = Rxz(i,j,k) + (fxz(i,j,k) - chix(i,j,k)*chiz(i,j,k)/chin_loc/TWO + gxz(i,j,k) * f_loc)/chin_loc/TWO
Ryz = Ryz + (fyz - chiy*chiz/chin1/TWO + gyz * f)/chin1/TWO Ryz(i,j,k) = Ryz(i,j,k) + (fyz(i,j,k) - chiy(i,j,k)*chiz(i,j,k)/chin_loc/TWO + gyz(i,j,k) * f_loc)/chin_loc/TWO
enddo
enddo
enddo
! covariant second derivatives of the lapse respect to physical metric ! covariant second derivatives of the lapse respect to physical metric
call fdderivs(ex,Lap,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, & call fdderivs(ex,Lap,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, &
SYM,SYM,SYM,symmetry,Lev) SYM,SYM,SYM,symmetry,Lev)
gxxx = (gupxx * chix + gupxy * chiy + gupxz * chiz)/chin1 do k=1,ex(3)
gxxy = (gupxy * chix + gupyy * chiy + gupyz * chiz)/chin1 do j=1,ex(2)
gxxz = (gupxz * chix + gupyz * chiy + gupzz * chiz)/chin1 do i=1,ex(1)
! now get physical second kind of connection chin_loc = chin1(i,j,k)
Gamxxx = Gamxxx - ( (chix + chix)/chin1 - gxx * gxxx )*HALF gxxx(i,j,k) = (gupxx(i,j,k) * chix(i,j,k) + gupxy(i,j,k) * chiy(i,j,k) + gupxz(i,j,k) * chiz(i,j,k)) / chin_loc
Gamyxx = Gamyxx - ( - gxx * gxxy )*HALF gxxy(i,j,k) = (gupxy(i,j,k) * chix(i,j,k) + gupyy(i,j,k) * chiy(i,j,k) + gupyz(i,j,k) * chiz(i,j,k)) / chin_loc
Gamzxx = Gamzxx - ( - gxx * gxxz )*HALF gxxz(i,j,k) = (gupxz(i,j,k) * chix(i,j,k) + gupyz(i,j,k) * chiy(i,j,k) + gupzz(i,j,k) * chiz(i,j,k)) / chin_loc
Gamxyy = Gamxyy - ( - gyy * gxxx )*HALF
Gamyyy = Gamyyy - ( (chiy + chiy)/chin1 - gyy * gxxy )*HALF
Gamzyy = Gamzyy - ( - gyy * gxxz )*HALF
Gamxzz = Gamxzz - ( - gzz * gxxx )*HALF
Gamyzz = Gamyzz - ( - gzz * gxxy )*HALF
Gamzzz = Gamzzz - ( (chiz + chiz)/chin1 - gzz * gxxz )*HALF
Gamxxy = Gamxxy - ( chiy /chin1 - gxy * gxxx )*HALF
Gamyxy = Gamyxy - ( chix /chin1 - gxy * gxxy )*HALF
Gamzxy = Gamzxy - ( - gxy * gxxz )*HALF
Gamxxz = Gamxxz - ( chiz /chin1 - gxz * gxxx )*HALF
Gamyxz = Gamyxz - ( - gxz * gxxy )*HALF
Gamzxz = Gamzxz - ( chix /chin1 - gxz * gxxz )*HALF
Gamxyz = Gamxyz - ( - gyz * gxxx )*HALF
Gamyyz = Gamyyz - ( chiz /chin1 - gyz * gxxy )*HALF
Gamzyz = Gamzyz - ( chiy /chin1 - gyz * gxxz )*HALF
fxx = fxx - Gamxxx*Lapx - Gamyxx*Lapy - Gamzxx*Lapz Gamxxx(i,j,k) = Gamxxx(i,j,k) - ( (chix(i,j,k) + chix(i,j,k))/chin_loc - gxx(i,j,k) * gxxx(i,j,k) )*HALF
fyy = fyy - Gamxyy*Lapx - Gamyyy*Lapy - Gamzyy*Lapz Gamyxx(i,j,k) = Gamyxx(i,j,k) - ( - gxx(i,j,k) * gxxy(i,j,k) )*HALF
fzz = fzz - Gamxzz*Lapx - Gamyzz*Lapy - Gamzzz*Lapz Gamzxx(i,j,k) = Gamzxx(i,j,k) - ( - gxx(i,j,k) * gxxz(i,j,k) )*HALF
fxy = fxy - Gamxxy*Lapx - Gamyxy*Lapy - Gamzxy*Lapz Gamxyy(i,j,k) = Gamxyy(i,j,k) - ( - gyy(i,j,k) * gxxx(i,j,k) )*HALF
fxz = fxz - Gamxxz*Lapx - Gamyxz*Lapy - Gamzxz*Lapz Gamyyy(i,j,k) = Gamyyy(i,j,k) - ( (chiy(i,j,k) + chiy(i,j,k))/chin_loc - gyy(i,j,k) * gxxy(i,j,k) )*HALF
fyz = fyz - Gamxyz*Lapx - Gamyyz*Lapy - Gamzyz*Lapz Gamzyy(i,j,k) = Gamzyy(i,j,k) - ( - gyy(i,j,k) * gxxz(i,j,k) )*HALF
Gamxzz(i,j,k) = Gamxzz(i,j,k) - ( - gzz(i,j,k) * gxxx(i,j,k) )*HALF
Gamyzz(i,j,k) = Gamyzz(i,j,k) - ( - gzz(i,j,k) * gxxy(i,j,k) )*HALF
Gamzzz(i,j,k) = Gamzzz(i,j,k) - ( (chiz(i,j,k) + chiz(i,j,k))/chin_loc - gzz(i,j,k) * gxxz(i,j,k) )*HALF
Gamxxy(i,j,k) = Gamxxy(i,j,k) - ( chiy(i,j,k) /chin_loc - gxy(i,j,k) * gxxx(i,j,k) )*HALF
Gamyxy(i,j,k) = Gamyxy(i,j,k) - ( chix(i,j,k) /chin_loc - gxy(i,j,k) * gxxy(i,j,k) )*HALF
Gamzxy(i,j,k) = Gamzxy(i,j,k) - ( - gxy(i,j,k) * gxxz(i,j,k) )*HALF
Gamxxz(i,j,k) = Gamxxz(i,j,k) - ( chiz(i,j,k) /chin_loc - gxz(i,j,k) * gxxx(i,j,k) )*HALF
Gamyxz(i,j,k) = Gamyxz(i,j,k) - ( - gxz(i,j,k) * gxxy(i,j,k) )*HALF
Gamzxz(i,j,k) = Gamzxz(i,j,k) - ( chix(i,j,k) /chin_loc - gxz(i,j,k) * gxxz(i,j,k) )*HALF
Gamxyz(i,j,k) = Gamxyz(i,j,k) - ( - gyz(i,j,k) * gxxx(i,j,k) )*HALF
Gamyyz(i,j,k) = Gamyyz(i,j,k) - ( chiz(i,j,k) /chin_loc - gyz(i,j,k) * gxxy(i,j,k) )*HALF
Gamzyz(i,j,k) = Gamzyz(i,j,k) - ( chiy(i,j,k) /chin_loc - gyz(i,j,k) * gxxz(i,j,k) )*HALF
! store D^i D_i Lap in trK_rhs upto chi fxx(i,j,k) = fxx(i,j,k) - Gamxxx(i,j,k)*Lapx(i,j,k) - Gamyxx(i,j,k)*Lapy(i,j,k) - Gamzxx(i,j,k)*Lapz(i,j,k)
trK_rhs = gupxx * fxx + gupyy * fyy + gupzz * fzz + & fyy(i,j,k) = fyy(i,j,k) - Gamxyy(i,j,k)*Lapx(i,j,k) - Gamyyy(i,j,k)*Lapy(i,j,k) - Gamzyy(i,j,k)*Lapz(i,j,k)
TWO* ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) fzz(i,j,k) = fzz(i,j,k) - Gamxzz(i,j,k)*Lapx(i,j,k) - Gamyzz(i,j,k)*Lapy(i,j,k) - Gamzzz(i,j,k)*Lapz(i,j,k)
#if 1 fxy(i,j,k) = fxy(i,j,k) - Gamxxy(i,j,k)*Lapx(i,j,k) - Gamyxy(i,j,k)*Lapy(i,j,k) - Gamzxy(i,j,k)*Lapz(i,j,k)
!! follow bam code fxz(i,j,k) = fxz(i,j,k) - Gamxxz(i,j,k)*Lapx(i,j,k) - Gamyxz(i,j,k)*Lapy(i,j,k) - Gamzxz(i,j,k)*Lapz(i,j,k)
S = chin1 * ( gupxx * Sxx + gupyy * Syy + gupzz * Szz + & fyz(i,j,k) = fyz(i,j,k) - Gamxyz(i,j,k)*Lapx(i,j,k) - Gamyyz(i,j,k)*Lapy(i,j,k) - Gamzyz(i,j,k)*Lapz(i,j,k)
TWO * ( gupxy * Sxy + gupxz * Sxz + gupyz * Syz ) )
f = F2o3 * trK * trK -(&
gupxx * ( &
gupxx * Axx * Axx + gupyy * Axy * Axy + gupzz * Axz * Axz + &
TWO * (gupxy * Axx * Axy + gupxz * Axx * Axz + gupyz * Axy * Axz) ) + &
gupyy * ( &
gupxx * Axy * Axy + gupyy * Ayy * Ayy + gupzz * Ayz * Ayz + &
TWO * (gupxy * Axy * Ayy + gupxz * Axy * Ayz + gupyz * Ayy * Ayz) ) + &
gupzz * ( &
gupxx * Axz * Axz + gupyy * Ayz * Ayz + gupzz * Azz * Azz + &
TWO * (gupxy * Axz * Ayz + gupxz * Axz * Azz + gupyz * Ayz * Azz) ) + &
TWO * ( &
gupxy * ( &
gupxx * Axx * Axy + gupyy * Axy * Ayy + gupzz * Axz * Ayz + &
gupxy * (Axx * Ayy + Axy * Axy) + &
gupxz * (Axx * Ayz + Axz * Axy) + &
gupyz * (Axy * Ayz + Axz * Ayy) ) + &
gupxz * ( &
gupxx * Axx * Axz + gupyy * Axy * Ayz + gupzz * Axz * Azz + &
gupxy * (Axx * Ayz + Axy * Axz) + &
gupxz * (Axx * Azz + Axz * Axz) + &
gupyz * (Axy * Azz + Axz * Ayz) ) + &
gupyz * ( &
gupxx * Axy * Axz + gupyy * Ayy * Ayz + gupzz * Ayz * Azz + &
gupxy * (Axy * Ayz + Ayy * Axz) + &
gupxz * (Axy * Azz + Ayz * Axz) + &
gupyz * (Ayy * Azz + Ayz * Ayz) ) )) -1.6d1*PI*rho + EIGHT * PI * S
f = - F1o3 *( gupxx * fxx + gupyy * fyy + gupzz * fzz + &
TWO* ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) + alpn1/chin1*f)
fxx = alpn1 * (Rxx - EIGHT * PI * Sxx) - fxx trK_rhs(i,j,k) = gupxx(i,j,k) * fxx(i,j,k) + gupyy(i,j,k) * fyy(i,j,k) + gupzz(i,j,k) * fzz(i,j,k) + &
fxy = alpn1 * (Rxy - EIGHT * PI * Sxy) - fxy TWO * (gupxy(i,j,k) * fxy(i,j,k) + gupxz(i,j,k) * fxz(i,j,k) + gupyz(i,j,k) * fyz(i,j,k))
fxz = alpn1 * (Rxz - EIGHT * PI * Sxz) - fxz enddo
fyy = alpn1 * (Ryy - EIGHT * PI * Syy) - fyy enddo
fyz = alpn1 * (Ryz - EIGHT * PI * Syz) - fyz enddo
fzz = alpn1 * (Rzz - EIGHT * PI * Szz) - fzz do k=1,ex(3)
#else do j=1,ex(2)
! Add lapse and S_ij parts to Ricci tensor: do i=1,ex(1)
divb_loc = div_beta(i,j,k)
chin_loc = chin1(i,j,k)
fxx = alpn1 * (Rxx - EIGHT * PI * Sxx) - fxx S_loc = chin_loc * ( gupxx(i,j,k) * Sxx(i,j,k) + gupyy(i,j,k) * Syy(i,j,k) + gupzz(i,j,k) * Szz(i,j,k) + &
fxy = alpn1 * (Rxy - EIGHT * PI * Sxy) - fxy TWO * (gupxy(i,j,k) * Sxy(i,j,k) + gupxz(i,j,k) * Sxz(i,j,k) + gupyz(i,j,k) * Syz(i,j,k)) )
fxz = alpn1 * (Rxz - EIGHT * PI * Sxz) - fxz S(i,j,k) = S_loc
fyy = alpn1 * (Ryy - EIGHT * PI * Syy) - fyy
fyz = alpn1 * (Ryz - EIGHT * PI * Syz) - fyz
fzz = alpn1 * (Rzz - EIGHT * PI * Szz) - fzz
! Compute trace-free part (note: chi^-1 and chi cancel!): f_loc = F2o3 * trK(i,j,k) * trK(i,j,k) - ( &
gupxx(i,j,k) * ( gupxx(i,j,k) * Axx(i,j,k) * Axx(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + &
gupzz(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + &
TWO * (gupxy(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + gupxz(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + &
gupyz(i,j,k) * Axy(i,j,k) * Axz(i,j,k)) ) + &
gupyy(i,j,k) * ( gupxx(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayy(i,j,k) + &
gupzz(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + &
TWO * (gupxy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + gupxz(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + &
gupyz(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k)) ) + &
gupzz(i,j,k) * ( gupxx(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + &
gupzz(i,j,k) * Azz(i,j,k) * Azz(i,j,k) + &
TWO * (gupxy(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + gupxz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + &
gupyz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k)) ) + &
TWO * ( gupxy(i,j,k) * ( gupxx(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + &
gupzz(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + &
gupxy(i,j,k) * (Axx(i,j,k) * Ayy(i,j,k) + Axy(i,j,k) * Axy(i,j,k)) + &
gupxz(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Axy(i,j,k)) + &
gupyz(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Ayy(i,j,k)) ) + &
gupxz(i,j,k) * ( gupxx(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + &
gupzz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + &
gupxy(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axy(i,j,k) * Axz(i,j,k)) + &
gupxz(i,j,k) * (Axx(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Axz(i,j,k)) + &
gupyz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Ayz(i,j,k)) ) + &
gupyz(i,j,k) * ( gupxx(i,j,k) * Axy(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k) + &
gupzz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k) + &
gupxy(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Ayy(i,j,k) * Axz(i,j,k)) + &
gupxz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Axz(i,j,k)) + &
gupyz(i,j,k) * (Ayy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Ayz(i,j,k)) ) ) ) - &
F16 * PI * rho(i,j,k) + EIGHT * PI * S_loc
f = F1o3 *( gupxx * fxx + gupyy * fyy + gupzz * fzz + & f_loc = -F1o3 * ( gupxx(i,j,k) * fxx(i,j,k) + gupyy(i,j,k) * fyy(i,j,k) + gupzz(i,j,k) * fzz(i,j,k) + &
TWO* ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) ) TWO * (gupxy(i,j,k) * fxy(i,j,k) + gupxz(i,j,k) * fxz(i,j,k) + gupyz(i,j,k) * fyz(i,j,k)) + &
#endif alpn1(i,j,k)/chin_loc * f_loc )
f(i,j,k) = f_loc
Axx_rhs = fxx - gxx * f l_fxx = alpn1(i,j,k) * (Rxx(i,j,k) - EIGHT * PI * Sxx(i,j,k)) - fxx(i,j,k)
Ayy_rhs = fyy - gyy * f l_fxy = alpn1(i,j,k) * (Rxy(i,j,k) - EIGHT * PI * Sxy(i,j,k)) - fxy(i,j,k)
Azz_rhs = fzz - gzz * f l_fxz = alpn1(i,j,k) * (Rxz(i,j,k) - EIGHT * PI * Sxz(i,j,k)) - fxz(i,j,k)
Axy_rhs = fxy - gxy * f l_fyy = alpn1(i,j,k) * (Ryy(i,j,k) - EIGHT * PI * Syy(i,j,k)) - fyy(i,j,k)
Axz_rhs = fxz - gxz * f l_fyz = alpn1(i,j,k) * (Ryz(i,j,k) - EIGHT * PI * Syz(i,j,k)) - fyz(i,j,k)
Ayz_rhs = fyz - gyz * f l_fzz = alpn1(i,j,k) * (Rzz(i,j,k) - EIGHT * PI * Szz(i,j,k)) - fzz(i,j,k)
! Now: store A_il A^l_j into fij: Axx_rhs(i,j,k) = l_fxx - gxx(i,j,k) * f_loc
Ayy_rhs(i,j,k) = l_fyy - gyy(i,j,k) * f_loc
Azz_rhs(i,j,k) = l_fzz - gzz(i,j,k) * f_loc
Axy_rhs(i,j,k) = l_fxy - gxy(i,j,k) * f_loc
Axz_rhs(i,j,k) = l_fxz - gxz(i,j,k) * f_loc
Ayz_rhs(i,j,k) = l_fyz - gyz(i,j,k) * f_loc
fxx = gupxx * Axx * Axx + gupyy * Axy * Axy + gupzz * Axz * Axz + & fxx(i,j,k) = gupxx(i,j,k) * Axx(i,j,k) * Axx(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + &
TWO * (gupxy * Axx * Axy + gupxz * Axx * Axz + gupyz * Axy * Axz) gupzz(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + TWO * (gupxy(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + &
fyy = gupxx * Axy * Axy + gupyy * Ayy * Ayy + gupzz * Ayz * Ayz + & gupxz(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + gupyz(i,j,k) * Axy(i,j,k) * Axz(i,j,k))
TWO * (gupxy * Axy * Ayy + gupxz * Axy * Ayz + gupyz * Ayy * Ayz) fyy(i,j,k) = gupxx(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayy(i,j,k) + &
fzz = gupxx * Axz * Axz + gupyy * Ayz * Ayz + gupzz * Azz * Azz + & gupzz(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + TWO * (gupxy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + &
TWO * (gupxy * Axz * Ayz + gupxz * Axz * Azz + gupyz * Ayz * Azz) gupxz(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + gupyz(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k))
fxy = gupxx * Axx * Axy + gupyy * Axy * Ayy + gupzz * Axz * Ayz + & fzz(i,j,k) = gupxx(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + &
gupxy *(Axx * Ayy + Axy * Axy) + & gupzz(i,j,k) * Azz(i,j,k) * Azz(i,j,k) + TWO * (gupxy(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + &
gupxz *(Axx * Ayz + Axz * Axy) + & gupxz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + gupyz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k))
gupyz *(Axy * Ayz + Axz * Ayy) fxy(i,j,k) = gupxx(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + &
fxz = gupxx * Axx * Axz + gupyy * Axy * Ayz + gupzz * Axz * Azz + & gupzz(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + gupxy(i,j,k) * (Axx(i,j,k) * Ayy(i,j,k) + Axy(i,j,k) * Axy(i,j,k)) + &
gupxy *(Axx * Ayz + Axy * Axz) + & gupxz(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Axy(i,j,k)) + &
gupxz *(Axx * Azz + Axz * Axz) + & gupyz(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Ayy(i,j,k))
gupyz *(Axy * Azz + Axz * Ayz) fxz(i,j,k) = gupxx(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + &
fyz = gupxx * Axy * Axz + gupyy * Ayy * Ayz + gupzz * Ayz * Azz + & gupzz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + gupxy(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axy(i,j,k) * Axz(i,j,k)) + &
gupxy *(Axy * Ayz + Ayy * Axz) + & gupxz(i,j,k) * (Axx(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Axz(i,j,k)) + &
gupxz *(Axy * Azz + Ayz * Axz) + & gupyz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Ayz(i,j,k))
gupyz *(Ayy * Azz + Ayz * Ayz) fyz(i,j,k) = gupxx(i,j,k) * Axy(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k) + &
gupzz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k) + gupxy(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Ayy(i,j,k) * Axz(i,j,k)) + &
gupxz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Axz(i,j,k)) + &
gupyz(i,j,k) * (Ayy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Ayz(i,j,k))
f = chin1 trK_rhs(i,j,k) = chin_loc * trK_rhs(i,j,k)
! store D^i D_i Lap in trK_rhs
trK_rhs = f*trK_rhs
Axx_rhs = f * Axx_rhs+ alpn1 * (trK * Axx - TWO * fxx) + & Axx_rhs(i,j,k) = chin_loc * Axx_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Axx(i,j,k) - TWO * fxx(i,j,k)) + &
TWO * ( Axx * betaxx + Axy * betayx + Axz * betazx )- & TWO * (Axx(i,j,k) * betaxx(i,j,k) + Axy(i,j,k) * betayx(i,j,k) + Axz(i,j,k) * betazx(i,j,k)) - &
F2o3 * Axx * div_beta F2o3 * Axx(i,j,k) * divb_loc
Ayy_rhs(i,j,k) = chin_loc * Ayy_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Ayy(i,j,k) - TWO * fyy(i,j,k)) + &
TWO * (Axy(i,j,k) * betaxy(i,j,k) + Ayy(i,j,k) * betayy(i,j,k) + Ayz(i,j,k) * betazy(i,j,k)) - &
F2o3 * Ayy(i,j,k) * divb_loc
Azz_rhs(i,j,k) = chin_loc * Azz_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Azz(i,j,k) - TWO * fzz(i,j,k)) + &
TWO * (Axz(i,j,k) * betaxz(i,j,k) + Ayz(i,j,k) * betayz(i,j,k) + Azz(i,j,k) * betazz(i,j,k)) - &
F2o3 * Azz(i,j,k) * divb_loc
Axy_rhs(i,j,k) = chin_loc * Axy_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Axy(i,j,k) - TWO * fxy(i,j,k)) + &
Axx(i,j,k) * betaxy(i,j,k) + Axz(i,j,k) * betazy(i,j,k) + Ayy(i,j,k) * betayx(i,j,k) + &
Ayz(i,j,k) * betazx(i,j,k) + F1o3 * Axy(i,j,k) * divb_loc - Axy(i,j,k) * betazz(i,j,k)
Ayz_rhs(i,j,k) = chin_loc * Ayz_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Ayz(i,j,k) - TWO * fyz(i,j,k)) + &
Axy(i,j,k) * betaxz(i,j,k) + Ayy(i,j,k) * betayz(i,j,k) + Axz(i,j,k) * betaxy(i,j,k) + &
Azz(i,j,k) * betazy(i,j,k) + F1o3 * Ayz(i,j,k) * divb_loc - Ayz(i,j,k) * betaxx(i,j,k)
Axz_rhs(i,j,k) = chin_loc * Axz_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Axz(i,j,k) - TWO * fxz(i,j,k)) + &
Axx(i,j,k) * betaxz(i,j,k) + Axy(i,j,k) * betayz(i,j,k) + Ayz(i,j,k) * betayx(i,j,k) + &
Azz(i,j,k) * betazx(i,j,k) + F1o3 * Axz(i,j,k) * divb_loc - Axz(i,j,k) * betayy(i,j,k)
Ayy_rhs = f * Ayy_rhs+ alpn1 * (trK * Ayy - TWO * fyy) + & trK_rhs(i,j,k) = - trK_rhs(i,j,k) + alpn1(i,j,k) * ( F1o3 * trK(i,j,k) * trK(i,j,k) + &
TWO * ( Axy * betaxy + Ayy * betayy + Ayz * betazy )- & gupxx(i,j,k) * fxx(i,j,k) + gupyy(i,j,k) * fyy(i,j,k) + gupzz(i,j,k) * fzz(i,j,k) + &
F2o3 * Ayy * div_beta TWO * (gupxy(i,j,k) * fxy(i,j,k) + gupxz(i,j,k) * fxz(i,j,k) + gupyz(i,j,k) * fyz(i,j,k)) + &
FOUR * PI * (rho(i,j,k) + S_loc) )
Azz_rhs = f * Azz_rhs+ alpn1 * (trK * Azz - TWO * fzz) + & enddo
TWO * ( Axz * betaxz + Ayz * betayz + Azz * betazz )- & enddo
F2o3 * Azz * div_beta enddo
Axy_rhs = f * Axy_rhs+ alpn1 *( trK * Axy - TWO * fxy )+ &
Axx * betaxy + Axz * betazy + &
Ayy * betayx + Ayz * betazx + &
F1o3 * Axy * div_beta - Axy * betazz
Ayz_rhs = f * Ayz_rhs+ alpn1 *( trK * Ayz - TWO * fyz )+ &
Axy * betaxz + Ayy * betayz + &
Axz * betaxy + Azz * betazy + &
F1o3 * Ayz * div_beta - Ayz * betaxx
Axz_rhs = f * Axz_rhs+ alpn1 *( trK * Axz - TWO * fxz )+ &
Axx * betaxz + Axy * betayz + &
Ayz * betayx + Azz * betazx + &
F1o3 * Axz * div_beta - Axz * betayy !rhs for Aij
! Compute trace of S_ij
S = f * ( gupxx * Sxx + gupyy * Syy + gupzz * Szz + &
TWO * ( gupxy * Sxy + gupxz * Sxz + gupyz * Syz ) )
trK_rhs = - trK_rhs + alpn1 *( F1o3 * trK * trK + &
gupxx * fxx + gupyy * fyy + gupzz * fzz + &
TWO * ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) + &
FOUR * PI * ( rho + S )) !rhs for trK
!!!! gauge variable part !!!! gauge variable part
@@ -948,15 +1000,15 @@
!!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency) !!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency)
! lopsided_kodis shares the symmetry_bd buffer between advection and ! lopsided_kodis shares the symmetry_bd buffer between advection and
! dissipation, eliminating redundant full-grid copies. For metric variables ! dissipation, eliminating redundant full-grid copies. For metric variables
! gxx/gyy/gzz (=dxx/dyy/dzz+1): kodis stencil coefficients sum to zero, ! gxx/gyy/gzz (=dxx/dyy/dzz+1): stencil coefficients sum to zero,
! so the constant offset has no effect on dissipation. ! so the constant offset has no effect on dissipation.
call lopsided_kodis(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,dxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps) call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps) call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided_kodis(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,dyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps) call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided_kodis(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,dzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps) call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps)

View File

@@ -32,6 +32,19 @@
#define f_compute_rhs_Z4c_ss compute_rhs_z4c_ss_ #define f_compute_rhs_Z4c_ss compute_rhs_z4c_ss_
#define f_compute_constraint_fr compute_constraint_fr_ #define f_compute_constraint_fr compute_constraint_fr_
#endif #endif
#ifdef __cplusplus
extern "C"
{
#endif
void f_bssn_rhs_kernel_timing_reset();
int f_bssn_rhs_kernel_timing_bucket_count();
const double *f_bssn_rhs_kernel_timing_local_seconds();
const char *f_bssn_rhs_kernel_timing_label(int);
#ifdef __cplusplus
}
#endif
extern "C" extern "C"
{ {
int f_compute_rhs_bssn(int *, double &, double *, double *, double *, // ex,T,X,Y,Z int f_compute_rhs_bssn(int *, double &, double *, double *, double *, // ex,T,X,Y,Z

View File

@@ -0,0 +1,248 @@
#include "tool.h"
/*
* Combined advection (lopsided) + KO dissipation (kodis).
* Uses one shared symmetry_bd buffer per call.
*/
void lopsided_kodis(const int ex[3],
const double *X, const double *Y, const double *Z,
const double *f, double *f_rhs,
const double *Sfx, const double *Sfy, const double *Sfz,
int Symmetry, const double SoA[3], double eps)
{
const double ZEO = 0.0, ONE = 1.0, F3 = 3.0;
const double F6 = 6.0, F18 = 18.0;
const double F12 = 12.0, F10 = 10.0, EIT = 8.0;
const double SIX = 6.0, FIT = 15.0, TWT = 20.0;
const double cof = 64.0; // 2^6
const int NO_SYMM = 0, EQ_SYMM = 1;
const int ex1 = ex[0], ex2 = ex[1], ex3 = ex[2];
const double dX = X[1] - X[0];
const double dY = Y[1] - Y[0];
const double dZ = Z[1] - Z[0];
const double d12dx = ONE / F12 / dX;
const double d12dy = ONE / F12 / dY;
const double d12dz = ONE / F12 / dZ;
const int imaxF = ex1;
const int jmaxF = ex2;
const int kmaxF = ex3;
int iminF = 1, jminF = 1, kminF = 1;
if (Symmetry > NO_SYMM && fabs(Z[0]) < dZ) kminF = -2;
if (Symmetry > EQ_SYMM && fabs(X[0]) < dX) iminF = -2;
if (Symmetry > EQ_SYMM && fabs(Y[0]) < dY) jminF = -2;
// fh for Fortran-style domain (-2:ex1,-2:ex2,-2:ex3)
const size_t nx = (size_t)ex1 + 3;
const size_t ny = (size_t)ex2 + 3;
const size_t nz = (size_t)ex3 + 3;
const size_t fh_size = nx * ny * nz;
double *fh = (double*)malloc(fh_size * sizeof(double));
if (!fh) return;
symmetry_bd(3, ex, f, fh, SoA);
// Advection (same stencil logic as lopsided_c.C)
for (int k0 = 0; k0 <= ex3 - 2; ++k0) {
const int kF = k0 + 1;
for (int j0 = 0; j0 <= ex2 - 2; ++j0) {
const int jF = j0 + 1;
for (int i0 = 0; i0 <= ex1 - 2; ++i0) {
const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
const double sfx = Sfx[p];
if (sfx > ZEO) {
if (i0 <= ex1 - 4) {
f_rhs[p] += sfx * d12dx *
(-F3 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF + 2, jF, kF, ex)]
+ fh[idx_fh_F(iF + 3, jF, kF, ex)]);
} else if (i0 <= ex1 - 3) {
f_rhs[p] += sfx * d12dx *
( fh[idx_fh_F(iF - 2, jF, kF, ex)]
-EIT * fh[idx_fh_F(iF - 1, jF, kF, ex)]
+EIT * fh[idx_fh_F(iF + 1, jF, kF, ex)]
- fh[idx_fh_F(iF + 2, jF, kF, ex)]);
} else if (i0 <= ex1 - 2) {
f_rhs[p] -= sfx * d12dx *
(-F3 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF - 2, jF, kF, ex)]
+ fh[idx_fh_F(iF - 3, jF, kF, ex)]);
}
} else if (sfx < ZEO) {
if ((i0 - 2) >= iminF) {
f_rhs[p] -= sfx * d12dx *
(-F3 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF - 2, jF, kF, ex)]
+ fh[idx_fh_F(iF - 3, jF, kF, ex)]);
} else if ((i0 - 1) >= iminF) {
f_rhs[p] += sfx * d12dx *
( fh[idx_fh_F(iF - 2, jF, kF, ex)]
-EIT * fh[idx_fh_F(iF - 1, jF, kF, ex)]
+EIT * fh[idx_fh_F(iF + 1, jF, kF, ex)]
- fh[idx_fh_F(iF + 2, jF, kF, ex)]);
} else if (i0 >= iminF) {
f_rhs[p] += sfx * d12dx *
(-F3 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF + 2, jF, kF, ex)]
+ fh[idx_fh_F(iF + 3, jF, kF, ex)]);
}
}
const double sfy = Sfy[p];
if (sfy > ZEO) {
if (j0 <= ex2 - 4) {
f_rhs[p] += sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF + 2, kF, ex)]
+ fh[idx_fh_F(iF, jF + 3, kF, ex)]);
} else if (j0 <= ex2 - 3) {
f_rhs[p] += sfy * d12dy *
( fh[idx_fh_F(iF, jF - 2, kF, ex)]
-EIT * fh[idx_fh_F(iF, jF - 1, kF, ex)]
+EIT * fh[idx_fh_F(iF, jF + 1, kF, ex)]
- fh[idx_fh_F(iF, jF + 2, kF, ex)]);
} else if (j0 <= ex2 - 2) {
f_rhs[p] -= sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF - 2, kF, ex)]
+ fh[idx_fh_F(iF, jF - 3, kF, ex)]);
}
} else if (sfy < ZEO) {
if ((j0 - 2) >= jminF) {
f_rhs[p] -= sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF - 2, kF, ex)]
+ fh[idx_fh_F(iF, jF - 3, kF, ex)]);
} else if ((j0 - 1) >= jminF) {
f_rhs[p] += sfy * d12dy *
( fh[idx_fh_F(iF, jF - 2, kF, ex)]
-EIT * fh[idx_fh_F(iF, jF - 1, kF, ex)]
+EIT * fh[idx_fh_F(iF, jF + 1, kF, ex)]
- fh[idx_fh_F(iF, jF + 2, kF, ex)]);
} else if (j0 >= jminF) {
f_rhs[p] += sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF + 2, kF, ex)]
+ fh[idx_fh_F(iF, jF + 3, kF, ex)]);
}
}
const double sfz = Sfz[p];
if (sfz > ZEO) {
if (k0 <= ex3 - 4) {
f_rhs[p] += sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF + 2, ex)]
+ fh[idx_fh_F(iF, jF, kF + 3, ex)]);
} else if (k0 <= ex3 - 3) {
f_rhs[p] += sfz * d12dz *
( fh[idx_fh_F(iF, jF, kF - 2, ex)]
-EIT * fh[idx_fh_F(iF, jF, kF - 1, ex)]
+EIT * fh[idx_fh_F(iF, jF, kF + 1, ex)]
- fh[idx_fh_F(iF, jF, kF + 2, ex)]);
} else if (k0 <= ex3 - 2) {
f_rhs[p] -= sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF - 2, ex)]
+ fh[idx_fh_F(iF, jF, kF - 3, ex)]);
}
} else if (sfz < ZEO) {
if ((k0 - 2) >= kminF) {
f_rhs[p] -= sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF - 2, ex)]
+ fh[idx_fh_F(iF, jF, kF - 3, ex)]);
} else if ((k0 - 1) >= kminF) {
f_rhs[p] += sfz * d12dz *
( fh[idx_fh_F(iF, jF, kF - 2, ex)]
-EIT * fh[idx_fh_F(iF, jF, kF - 1, ex)]
+EIT * fh[idx_fh_F(iF, jF, kF + 1, ex)]
- fh[idx_fh_F(iF, jF, kF + 2, ex)]);
} else if (k0 >= kminF) {
f_rhs[p] += sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF + 2, ex)]
+ fh[idx_fh_F(iF, jF, kF + 3, ex)]);
}
}
}
}
}
// KO dissipation (same domain restriction as kodiss_c.C)
if (eps > ZEO) {
const int i0_lo = (iminF + 2 > 0) ? iminF + 2 : 0;
const int j0_lo = (jminF + 2 > 0) ? jminF + 2 : 0;
const int k0_lo = (kminF + 2 > 0) ? kminF + 2 : 0;
const int i0_hi = imaxF - 4; // inclusive
const int j0_hi = jmaxF - 4;
const int k0_hi = kmaxF - 4;
if (!(i0_lo > i0_hi || j0_lo > j0_hi || k0_lo > k0_hi)) {
for (int k0 = k0_lo; k0 <= k0_hi; ++k0) {
const int kF = k0 + 1;
for (int j0 = j0_lo; j0 <= j0_hi; ++j0) {
const int jF = j0 + 1;
for (int i0 = i0_lo; i0 <= i0_hi; ++i0) {
const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
const double Dx_term =
((fh[idx_fh_F(iF - 3, jF, kF, ex)] + fh[idx_fh_F(iF + 3, jF, kF, ex)]) -
SIX * (fh[idx_fh_F(iF - 2, jF, kF, ex)] + fh[idx_fh_F(iF + 2, jF, kF, ex)]) +
FIT * (fh[idx_fh_F(iF - 1, jF, kF, ex)] + fh[idx_fh_F(iF + 1, jF, kF, ex)]) -
TWT * fh[idx_fh_F(iF, jF, kF, ex)]) / dX;
const double Dy_term =
((fh[idx_fh_F(iF, jF - 3, kF, ex)] + fh[idx_fh_F(iF, jF + 3, kF, ex)]) -
SIX * (fh[idx_fh_F(iF, jF - 2, kF, ex)] + fh[idx_fh_F(iF, jF + 2, kF, ex)]) +
FIT * (fh[idx_fh_F(iF, jF - 1, kF, ex)] + fh[idx_fh_F(iF, jF + 1, kF, ex)]) -
TWT * fh[idx_fh_F(iF, jF, kF, ex)]) / dY;
const double Dz_term =
((fh[idx_fh_F(iF, jF, kF - 3, ex)] + fh[idx_fh_F(iF, jF, kF + 3, ex)]) -
SIX * (fh[idx_fh_F(iF, jF, kF - 2, ex)] + fh[idx_fh_F(iF, jF, kF + 2, ex)]) +
FIT * (fh[idx_fh_F(iF, jF, kF - 1, ex)] + fh[idx_fh_F(iF, jF, kF + 1, ex)]) -
TWT * fh[idx_fh_F(iF, jF, kF, ex)]) / dZ;
f_rhs[p] += (eps / cof) * (Dx_term + Dy_term + Dz_term);
}
}
}
}
}
free(fh);
}

View File

@@ -1934,18 +1934,35 @@
! when if=1 -> ic=0, this is different to vertex center grid ! when if=1 -> ic=0, this is different to vertex center grid
real*8, dimension(-2:extc(1),-2:extc(2),-2:extc(3)) :: funcc real*8, dimension(-2:extc(1),-2:extc(2),-2:extc(3)) :: funcc
integer,dimension(3) :: cxI integer,dimension(3) :: cxI
integer :: i,j,k,ii,jj,kk integer :: i,j,k,ii,jj,kk,px,py,pz
real*8, dimension(6,6) :: tmp2 real*8, dimension(6,6) :: tmp2
real*8, dimension(6) :: tmp1 real*8, dimension(6) :: tmp1
integer, dimension(extf(1)) :: cix
integer, dimension(extf(2)) :: ciy
integer, dimension(extf(3)) :: ciz
integer, dimension(extf(1)) :: pix
integer, dimension(extf(2)) :: piy
integer, dimension(extf(3)) :: piz
real*8, parameter :: C1=7.7d1/8.192d3,C2=-6.93d2/8.192d3,C3=3.465d3/4.096d3 real*8, parameter :: C1=7.7d1/8.192d3,C2=-6.93d2/8.192d3,C3=3.465d3/4.096d3
real*8, parameter :: C6=6.3d1/8.192d3,C5=-4.95d2/8.192d3,C4=1.155d3/4.096d3 real*8, parameter :: C6=6.3d1/8.192d3,C5=-4.95d2/8.192d3,C4=1.155d3/4.096d3
real*8, dimension(6,2), parameter :: WC = reshape((/&
C1,C2,C3,C4,C5,C6,&
C6,C5,C4,C3,C2,C1/), (/6,2/))
integer::imini,imaxi,jmini,jmaxi,kmini,kmaxi integer::imini,imaxi,jmini,jmaxi,kmini,kmaxi
integer::imino,imaxo,jmino,jmaxo,kmino,kmaxo integer::imino,imaxo,jmino,jmaxo,kmino,kmaxo
integer::maxcx,maxcy,maxcz
real*8,dimension(3) :: CD,FD real*8,dimension(3) :: CD,FD
real*8 :: tmp_yz(extc(1), 6) ! X 线 6 Y Z
real*8 :: tmp_xyz_line(-2:extc(1)) ! X 6 访
real*8 :: v1, v2, v3, v4, v5, v6
integer :: ic, jc, kc, ix_offset,ix,iy,iz,jc_min,jc_max,ic_min,ic_max,kc_min,kc_max
integer :: i_lo, i_hi, j_lo, j_hi, k_lo, k_hi
logical :: need_full_symmetry
real*8 :: res_line
real*8 :: tmp_z_slab(-2:extc(1), -2:extc(2)) ! Y/X 访
if(wei.ne.3)then if(wei.ne.3)then
write(*,*)"prolongrestrict.f90::prolong3: this routine only surport 3 dimension" write(*,*)"prolongrestrict.f90::prolong3: this routine only surport 3 dimension"
write(*,*)"dim = ",wei write(*,*)"dim = ",wei
@@ -2020,145 +2037,140 @@
return return
endif endif
do i = imino,imaxo
ii = i + lbf(1) - 1
cix(i) = ii/2 - lbc(1) + 1
if(ii/2*2 == ii)then
pix(i) = 1
else
pix(i) = 2
endif
enddo
do j = jmino,jmaxo
jj = j + lbf(2) - 1
ciy(j) = jj/2 - lbc(2) + 1
if(jj/2*2 == jj)then
piy(j) = 1
else
piy(j) = 2
endif
enddo
do k = kmino,kmaxo
kk = k + lbf(3) - 1
ciz(k) = kk/2 - lbc(3) + 1
if(kk/2*2 == kk)then
piz(k) = 1
else
piz(k) = 2
endif
enddo
ic_min = minval(cix(imino:imaxo))
ic_max = maxval(cix(imino:imaxo))
jc_min = minval(ciy(jmino:jmaxo))
jc_max = maxval(ciy(jmino:jmaxo))
kc_min = minval(ciz(kmino:kmaxo))
kc_max = maxval(ciz(kmino:kmaxo))
maxcx = ic_max
maxcy = jc_max
maxcz = kc_max
if(maxcx+3 > extc(1) .or. maxcy+3 > extc(2) .or. maxcz+3 > extc(3))then
write(*,*)"error in prolong"
return
endif
i_lo = ic_min - 2
i_hi = ic_max + 3
j_lo = jc_min - 2
j_hi = jc_max + 3
k_lo = kc_min - 2
k_hi = kc_max + 3
need_full_symmetry = (i_lo < 1) .or. (j_lo < 1) .or. (k_lo < 1)
if(need_full_symmetry)then
call symmetry_bd(3,extc,func,funcc,SoA) call symmetry_bd(3,extc,func,funcc,SoA)
else
funcc(i_lo:i_hi,j_lo:j_hi,k_lo:k_hi) = func(i_lo:i_hi,j_lo:j_hi,k_lo:k_hi)
endif
! kpz, kc Z 2D
do k = kmino, kmaxo
pz = piz(k); kc = ciz(k)
! --- Pass 1: Z ---
do iy = jc_min-2, jc_max+3 ! iy jc-2:jc+3
do ii = ic_min-2, ic_max+3 ! ii cix-2:cix+3
tmp_z_slab(ii, iy) = sum(WC(:,pz) * funcc(ii, iy, kc-2:kc+3))
end do
end do
do j = jmino, jmaxo
py = piy(j); jc = ciy(j)
! --- Pass 2: Y ---
do ii = ic_min-2, ic_max+3
tmp_xyz_line(ii) = sum(WC(:,py) * tmp_z_slab(ii, jc-2:jc+3))
end do
! --- Pass 3: X ---
do i = imino, imaxo
funf(i,j,k) = sum(WC(:,pix(i)) * tmp_xyz_line(cix(i)-2:cix(i)+3))
end do
end do
end do
!~~~~~~> prolongation start... !~~~~~~> prolongation start...
do k = kmino,kmaxo
do j = jmino,jmaxo
do i = imino,imaxo
cxI(1) = i
cxI(2) = j
cxI(3) = k
! change to coarse level reference
!|---*--- ---*--- ---*--- ---*--- ---*--- ---*--- ---*--- ---*---|
!|=======x===============x===============x===============x=======|
cxI = (cxI+lbf-1)/2
! change to array index
cxI = cxI - lbc + 1
if(any(cxI+3 > extc)) write(*,*)"error in prolong"
ii=i+lbf(1)-1
jj=j+lbf(2)-1
kk=k+lbf(3)-1
#if 0 #if 0
if(ii/2*2==ii)then do k = kmino, kmaxo
if(jj/2*2==jj)then pz = piz(k)
if(kk/2*2==kk)then kc = ciz(k)
tmp2= C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C1*tmp2(:,1)+C2*tmp2(:,2)+C3*tmp2(:,3)+C4*tmp2(:,4)+C5*tmp2(:,5)+C6*tmp2(:,6)
funf(i,j,k)= C1*tmp1(1)+C2*tmp1(2)+C3*tmp1(3)+C4*tmp1(4)+C5*tmp1(5)+C6*tmp1(6)
else
tmp2= C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C1*tmp2(:,1)+C2*tmp2(:,2)+C3*tmp2(:,3)+C4*tmp2(:,4)+C5*tmp2(:,5)+C6*tmp2(:,6)
funf(i,j,k)= C1*tmp1(1)+C2*tmp1(2)+C3*tmp1(3)+C4*tmp1(4)+C5*tmp1(5)+C6*tmp1(6)
endif
else
if(kk/2*2==kk)then
tmp2= C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C6*tmp2(:,1)+C5*tmp2(:,2)+C4*tmp2(:,3)+C3*tmp2(:,4)+C2*tmp2(:,5)+C1*tmp2(:,6)
funf(i,j,k)= C1*tmp1(1)+C2*tmp1(2)+C3*tmp1(3)+C4*tmp1(4)+C5*tmp1(5)+C6*tmp1(6)
else
tmp2= C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C6*tmp2(:,1)+C5*tmp2(:,2)+C4*tmp2(:,3)+C3*tmp2(:,4)+C2*tmp2(:,5)+C1*tmp2(:,6)
funf(i,j,k)= C1*tmp1(1)+C2*tmp1(2)+C3*tmp1(3)+C4*tmp1(4)+C5*tmp1(5)+C6*tmp1(6)
endif
endif
else
if(jj/2*2==jj)then
if(kk/2*2==kk)then
tmp2= C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C1*tmp2(:,1)+C2*tmp2(:,2)+C3*tmp2(:,3)+C4*tmp2(:,4)+C5*tmp2(:,5)+C6*tmp2(:,6)
funf(i,j,k)= C6*tmp1(1)+C5*tmp1(2)+C4*tmp1(3)+C3*tmp1(4)+C2*tmp1(5)+C1*tmp1(6)
else
tmp2= C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C1*tmp2(:,1)+C2*tmp2(:,2)+C3*tmp2(:,3)+C4*tmp2(:,4)+C5*tmp2(:,5)+C6*tmp2(:,6)
funf(i,j,k)= C6*tmp1(1)+C5*tmp1(2)+C4*tmp1(3)+C3*tmp1(4)+C2*tmp1(5)+C1*tmp1(6)
endif
else
if(kk/2*2==kk)then
tmp2= C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C6*tmp2(:,1)+C5*tmp2(:,2)+C4*tmp2(:,3)+C3*tmp2(:,4)+C2*tmp2(:,5)+C1*tmp2(:,6)
funf(i,j,k)= C6*tmp1(1)+C5*tmp1(2)+C4*tmp1(3)+C3*tmp1(4)+C2*tmp1(5)+C1*tmp1(6)
else
tmp2= C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
tmp1= C6*tmp2(:,1)+C5*tmp2(:,2)+C4*tmp2(:,3)+C3*tmp2(:,4)+C2*tmp2(:,5)+C1*tmp2(:,6)
funf(i,j,k)= C6*tmp1(1)+C5*tmp1(2)+C4*tmp1(3)+C3*tmp1(4)+C2*tmp1(5)+C1*tmp1(6)
endif
endif
endif
#else
if(kk/2*2==kk)then
tmp2= C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
else
tmp2= C6*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-2)+&
C5*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)-1)+&
C4*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3) )+&
C3*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+1)+&
C2*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+2)+&
C1*funcc(cxI(1)-2:cxI(1)+3,cxI(2)-2:cxI(2)+3,cxI(3)+3)
endif
if(jj/2*2==jj)then do j = jmino, jmaxo
tmp1= C1*tmp2(:,1)+C2*tmp2(:,2)+C3*tmp2(:,3)+C4*tmp2(:,4)+C5*tmp2(:,5)+C6*tmp2(:,6) py = piy(j)
else jc = ciy(j)
tmp1= C6*tmp2(:,1)+C5*tmp2(:,2)+C4*tmp2(:,3)+C3*tmp2(:,4)+C2*tmp2(:,5)+C1*tmp2(:,6)
endif
if(ii/2*2==ii)then ! --- 1 & 2 X Cache ---
funf(i,j,k)= C1*tmp1(1)+C2*tmp1(2)+C3*tmp1(3)+C4*tmp1(4)+C5*tmp1(5)+C6*tmp1(6) ! ii funcc 访
else do ii = 1, extc(1)
funf(i,j,k)= C6*tmp1(1)+C5*tmp1(2)+C4*tmp1(3)+C3*tmp1(4)+C2*tmp1(5)+C1*tmp1(6) ! 1. Z 6 线 ii 6 iy
endif ! Y tmp_yz
! funcc
res_line = 0.0d0
do jj = 1, 6
iy = jc - 3 + jj
! Z Y
! WC(jj, py)
res_line = res_line + WC(jj, py) * ( &
WC(1, pz) * funcc(ii, iy, kc-2) + &
WC(2, pz) * funcc(ii, iy, kc-1) + &
WC(3, pz) * funcc(ii, iy, kc ) + &
WC(4, pz) * funcc(ii, iy, kc+1) + &
WC(5, pz) * funcc(ii, iy, kc+2) + &
WC(6, pz) * funcc(ii, iy, kc+3) )
end do
tmp_xyz_line(ii) = res_line
end do
! 3. X X 6
! 200+ 6
do i = imino, imaxo
px = pix(i)
ic = cix(i)
! line 6
! ic-2 ic+3 6
funf(i,j,k) = WC(1,px)*tmp_xyz_line(ic-2) + &
WC(2,px)*tmp_xyz_line(ic-1) + &
WC(3,px)*tmp_xyz_line(ic ) + &
WC(4,px)*tmp_xyz_line(ic+1) + &
WC(5,px)*tmp_xyz_line(ic+2) + &
WC(6,px)*tmp_xyz_line(ic+3)
end do
end do
end do
#endif #endif
enddo
enddo
enddo
return return
end subroutine prolong3 end subroutine prolong3
@@ -2358,6 +2370,13 @@
real*8,dimension(3) :: CD,FD real*8,dimension(3) :: CD,FD
real*8 :: tmp_xz_plane(-1:extf(1), 6)
real*8 :: tmp_x_line(-1:extf(1))
integer :: fi, fj, fk, ii, jj, kk
integer :: fi_min, fi_max, ii_lo, ii_hi
integer :: fj_min, fj_max, fk_min, fk_max, jj_lo, jj_hi, kk_lo, kk_hi
logical :: need_full_symmetry
if(wei.ne.3)then if(wei.ne.3)then
write(*,*)"prolongrestrict.f90::restrict3: this routine only surport 3 dimension" write(*,*)"prolongrestrict.f90::restrict3: this routine only surport 3 dimension"
write(*,*)"dim = ",wei write(*,*)"dim = ",wei
@@ -2436,9 +2455,86 @@
stop stop
endif endif
! X
! func(i,j,k) 访 tmp_x_line(fi-2:fi+3)
fi_min = 2*(imino + lbc(1) - 1) - 1 - lbf(1) + 1
fi_max = 2*(imaxo + lbc(1) - 1) - 1 - lbf(1) + 1
fj_min = 2*(jmino + lbc(2) - 1) - 1 - lbf(2) + 1
fj_max = 2*(jmaxo + lbc(2) - 1) - 1 - lbf(2) + 1
fk_min = 2*(kmino + lbc(3) - 1) - 1 - lbf(3) + 1
fk_max = 2*(kmaxo + lbc(3) - 1) - 1 - lbf(3) + 1
ii_lo = fi_min - 2
ii_hi = fi_max + 3
jj_lo = fj_min - 2
jj_hi = fj_max + 3
kk_lo = fk_min - 2
kk_hi = fk_max + 3
if(ii_lo < -1 .or. ii_hi > extf(1) .or. &
jj_lo < -1 .or. jj_hi > extf(2) .or. &
kk_lo < -1 .or. kk_hi > extf(3))then
write(*,*)"restrict3: invalid stencil window"
write(*,*)"ii=",ii_lo,ii_hi," jj=",jj_lo,jj_hi," kk=",kk_lo,kk_hi
write(*,*)"extf=",extf
stop
endif
need_full_symmetry = (ii_lo < 1) .or. (jj_lo < 1) .or. (kk_lo < 1)
if(need_full_symmetry)then
call symmetry_bd(2,extf,funf,funff,SoA) call symmetry_bd(2,extf,funf,funff,SoA)
else
funff(ii_lo:ii_hi,jj_lo:jj_hi,kk_lo:kk_hi) = funf(ii_lo:ii_hi,jj_lo:jj_hi,kk_lo:kk_hi)
endif
!~~~~~~> restriction start... !~~~~~~> restriction start...
do k = kmino, kmaxo
fk = 2*(k + lbc(3) - 1) - 1 - lbf(3) + 1
do j = jmino, jmaxo
fj = 2*(j + lbc(2) - 1) - 1 - lbf(2) + 1
! 1: Z
! ii 访
!DIR$ VECTOR ALWAYS
do ii = ii_lo, ii_hi
! j 6 Z
! jj
tmp_xz_plane(ii, 1) = C1*(funff(ii,fj-2,fk-2)+funff(ii,fj-2,fk+3)) + &
C2*(funff(ii,fj-2,fk-1)+funff(ii,fj-2,fk+2)) + &
C3*(funff(ii,fj-2,fk )+funff(ii,fj-2,fk+1))
tmp_xz_plane(ii, 2) = C1*(funff(ii,fj-1,fk-2)+funff(ii,fj-1,fk+3)) + &
C2*(funff(ii,fj-1,fk-1)+funff(ii,fj-1,fk+2)) + &
C3*(funff(ii,fj-1,fk )+funff(ii,fj-1,fk+1))
tmp_xz_plane(ii, 3) = C1*(funff(ii,fj ,fk-2)+funff(ii,fj ,fk+3)) + &
C2*(funff(ii,fj ,fk-1)+funff(ii,fj ,fk+2)) + &
C3*(funff(ii,fj ,fk )+funff(ii,fj ,fk+1))
tmp_xz_plane(ii, 4) = C1*(funff(ii,fj+1,fk-2)+funff(ii,fj+1,fk+3)) + &
C2*(funff(ii,fj+1,fk-1)+funff(ii,fj+1,fk+2)) + &
C3*(funff(ii,fj+1,fk )+funff(ii,fj+1,fk+1))
tmp_xz_plane(ii, 5) = C1*(funff(ii,fj+2,fk-2)+funff(ii,fj+2,fk+3)) + &
C2*(funff(ii,fj+2,fk-1)+funff(ii,fj+2,fk+2)) + &
C3*(funff(ii,fj+2,fk )+funff(ii,fj+2,fk+1))
tmp_xz_plane(ii, 6) = C1*(funff(ii,fj+3,fk-2)+funff(ii,fj+3,fk+3)) + &
C2*(funff(ii,fj+3,fk-1)+funff(ii,fj+3,fk+2)) + &
C3*(funff(ii,fj+3,fk )+funff(ii,fj+3,fk+1))
end do
! 2: Y
!DIR$ VECTOR ALWAYS
do ii = ii_lo, ii_hi
tmp_x_line(ii) = C1*(tmp_xz_plane(ii, 1) + tmp_xz_plane(ii, 6)) + &
C2*(tmp_xz_plane(ii, 2) + tmp_xz_plane(ii, 5)) + &
C3*(tmp_xz_plane(ii, 3) + tmp_xz_plane(ii, 4))
end do
! 3: tmp_x_line
do i = imino, imaxo
fi = 2*(i + lbc(1) - 1) - 1 - lbf(1) + 1
func(i, j, k) = C1*(tmp_x_line(fi-2) + tmp_x_line(fi+3)) + &
C2*(tmp_x_line(fi-1) + tmp_x_line(fi+2)) + &
C3*(tmp_x_line(fi ) + tmp_x_line(fi+1))
end do
end do
end do
#if 0
do k = kmino,kmaxo do k = kmino,kmaxo
do j = jmino,jmaxo do j = jmino,jmaxo
do i = imino,imaxo do i = imino,imaxo
@@ -2462,7 +2558,7 @@
enddo enddo
enddo enddo
enddo enddo
#endif
return return
end subroutine restrict3 end subroutine restrict3

Some files were not shown because too many files have changed in this diff Show More