Compare commits

..

34 Commits

Author SHA1 Message Date
43975017eb prolong3 改为先算实际 stencil 窗口;只有窗口触及对称边界时才走全域 symmetry_bd,否则只复制必需窗口。restrict3 同样改成窗口判定,无触边时仅填 ii/jj/kk 必需窗口。 2026-03-02 18:10:38 +08:00
485667ef4c perf(restrict3): shrink X-pass ii sweep to required overlap window
- compute fi_min/fi_max from output i-range and derive ii_lo/ii_hi
 - replace full ii sweep (-1:extf(1)) with windowed sweep in Z/Y precompute passes
 - keep stencil math unchanged; add bounds sanity check for ii window
2026-03-02 18:10:38 +08:00
2a977ce82e perf(MPatch): 用空间 bin 索引加速 Interp_Points 的 block 归属查找
- 为 Patch::Interp_Points 三个重载引入 BlockBinIndex(候选筛选 + 全扫回退)
  - 保持原 point-in-block 判定与后续插值/通信流程不变
  - 将逐点线性扫块从 O(N_points*N_blocks) 降为近似 O(N_points*k)
  - 测试:bin 上限如果太大,会引入不必要的索引构建开销。将 bins 上限设为 16。

Co-authored-by: gpt-5.3-codex
2026-03-02 18:10:35 +08:00
160e2a0369 fix prolong/restrict index bounds after cherry-pick 12e1f63 2026-03-02 14:11:52 +08:00
01410de05a refactor(Parallel): streamline MPI communication by consolidating request handling and memory management 2026-03-02 14:08:27 +08:00
83c826eb49 prolong3: 减少Z-pass 冗余计算 2026-03-02 14:08:13 +08:00
43ddaab903 fix: add C RK4 kernel to CFILES_CUDA 2026-03-02 12:19:52 +08:00
5839755c2f compute div_beta on-the-fly to remove temp array 2026-03-02 12:12:58 +08:00
a893b4007c merge lopsided+kodis 2026-03-02 12:12:26 +08:00
ad5ff03615 build: switch allocator option to oneTBB tbbmalloc
(cherry picked from commit e29ca2dca9)
2026-03-02 11:53:30 +08:00
b4bc0ef269 先关闭绑核心,发现速度对比:不绑定核心+SCX>绑核心+SCX
(cherry picked from commit 1eba73acbe)
2026-03-02 11:53:13 +08:00
b185f84cce Add switchable C RK4 kernel and build toggle
(cherry picked from commit b91cfff301)
2026-03-02 11:53:00 +08:00
71f6eb7b44 Remove profiling code 2026-03-02 11:29:48 +08:00
90620c2aec Optimize fdderivs: skip redundant 2nd-order work in 4th-order overlap 2026-03-02 11:04:04 +08:00
f561522d89 prolong3:提升cache命中率 2026-03-02 11:02:19 +08:00
jaunatisblue
3f4715b8cc 修改prolong 2026-03-02 11:02:17 +08:00
jaunatisblue
710ea8f76b 对prolong3做访存优化 2026-03-02 11:02:12 +08:00
5cf891359d Optimize symmetry_bd with stride-based fast paths
(cherry picked from commit 16013081e0)
2026-03-02 11:01:49 +08:00
222747449a Optimize average2: use DO CONCURRENT loop form
(cherry picked from commit 1a518cd3f6)
2026-03-02 11:01:45 +08:00
14de4d535e Optimize average2: replace array expression with explicit loops
(cherry picked from commit 1dc622e516)
2026-03-02 11:01:42 +08:00
787295692a Optimize prolong3: hoist bounds check out of inner loop
(cherry picked from commit 3046a0ccde)
2026-03-02 11:01:39 +08:00
335f2f23fe Optimize prolong3: replace parity branches with coefficient lookup
(cherry picked from commit d4ec69c98a)
2026-03-02 11:01:37 +08:00
7109474a14 Optimize prolong3: precompute coarse index/parity maps
(cherry picked from commit 2c0a3055d4)
2026-03-02 11:01:31 +08:00
e7a02e8f72 perf(polint): add uniform-grid fast path for barycentric n=6 2026-03-01 14:13:51 +08:00
8dad910c6c perf(polint): add switchable barycentric ordn=6 path 2026-03-01 14:13:51 +08:00
01b4cf71d1 perf(polin3): switch to lagrange-weight tensor contraction 2026-03-01 14:13:04 +08:00
66dabe8cc4 perf(polint): add ordn=6 specialized neville path 2026-03-01 14:12:22 +08:00
abf2f640e4 add fused symmetry packing kernels for orders 2 and 3 in BSSN RHS 2026-02-28 15:35:14 +08:00
94f40627aa refine GPU dispatch initialization and optimize H2D/D2H data transfers 2026-02-28 15:23:41 +08:00
d94c31c5c4 [WIP]Implement multi-GPU support in BSSN RHS and add profiling for H2D/D2H transfers 2026-02-28 11:12:14 +08:00
724e9cd415 [WIP]Add CUDA support for BSSN RHS with new kernel and update makefiles 2026-02-28 11:12:13 +08:00
c001939461 Add Lagrange interpolation subroutine and update calls in prolongrestrict modules 2026-02-28 11:12:13 +08:00
94d236385d Revert "skip redundant MPI ghost cell syncs for stages 0, 1 & 2"
This reverts commit f7ada421cf.
2026-02-28 11:12:12 +08:00
780f1c80d0 skip redundant MPI ghost cell syncs for stages 0, 1 & 2
BSSN 每个 RK4 时间步执行 4 次 MPI ghost zone 同步:
Stage 0(预测)结束后:Parallel::Sync(SynchList_pre)
Stage 1(校正 1)结束后:Parallel::Sync(SynchList_cor)
Stage 2(校正 2)结束后:Parallel::Sync(SynchList_cor)
Stage 3(校正 3)结束后:Parallel::Sync(SynchList_cor) ← 必要(为下一步提供 ghost)

bssnEM_class.C、Z4c_class.C 结构相同,一起修改了
2026-02-28 11:12:09 +08:00
31 changed files with 4148 additions and 1853 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -108,9 +108,6 @@ namespace Parallel
MPI_Status *stats; MPI_Status *stats;
int max_reqs; int max_reqs;
bool lengths_valid; bool lengths_valid;
int *tc_req_node;
int *tc_req_is_recv;
int *tc_completed;
SyncCache(); SyncCache();
void invalidate(); void invalidate();
void destroy(); void destroy();
@@ -124,10 +121,7 @@ namespace Parallel
struct AsyncSyncState { struct AsyncSyncState {
int req_no; int req_no;
bool active; bool active;
int *req_node; AsyncSyncState() : req_no(0), active(false) {}
int *req_is_recv;
int pending_recv;
AsyncSyncState() : req_no(0), active(false), req_node(0), req_is_recv(0), pending_recv(0) {}
}; };
void Sync_start(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry, void Sync_start(MyList<Patch> *PatL, MyList<var> *VarList, int Symmetry,

View File

@@ -36,18 +36,12 @@ using namespace std;
#include "myglobal.h" #include "myglobal.h"
#endif #endif
#include "perf.h" #include "perf.h"
#include "derivatives.h" #include "derivatives.h"
#include "ricci_gamma.h" #include "ricci_gamma.h"
// Compile-time switch for per-timestep memory usage collection/printing. //================================================================================================
// Default is OFF to reduce overhead in production runs.
#ifndef BSSN_ENABLE_MEM_USAGE_LOG
#define BSSN_ENABLE_MEM_USAGE_LOG 0
#endif
//================================================================================================
// define bssn_class // define bssn_class
@@ -742,8 +736,6 @@ void bssn_class::Initialize()
sync_cache_cor = new Parallel::SyncCache[GH->levels]; sync_cache_cor = new Parallel::SyncCache[GH->levels];
sync_cache_rp_coarse = new Parallel::SyncCache[GH->levels]; sync_cache_rp_coarse = new Parallel::SyncCache[GH->levels];
sync_cache_rp_fine = new Parallel::SyncCache[GH->levels]; sync_cache_rp_fine = new Parallel::SyncCache[GH->levels];
sync_cache_restrict = new Parallel::SyncCache[GH->levels];
sync_cache_outbd = new Parallel::SyncCache[GH->levels];
} }
//================================================================================================ //================================================================================================
@@ -2135,10 +2127,8 @@ void bssn_class::Evolve(int Steps)
#endif #endif
*/ */
#if BSSN_ENABLE_MEM_USAGE_LOG perf bssn_perf;
perf bssn_perf; size_t current_min, current_avg, current_max, peak_min, peak_avg, peak_max;
size_t current_min, current_avg, current_max, peak_min, peak_avg, peak_max;
#endif
for (int lev = 0; lev < GH->levels; lev++) for (int lev = 0; lev < GH->levels; lev++)
GH->Lt[lev] = PhysTime; GH->Lt[lev] = PhysTime;
@@ -2223,7 +2213,7 @@ void bssn_class::Evolve(int Steps)
GH->Regrid(Symmetry, BH_num, Porgbr, Porg0, GH->Regrid(Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_mon, StartTime, dT_mon / 2), ErrorMonitor); fgt(PhysTime - dT_mon, StartTime, dT_mon / 2), ErrorMonitor);
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
#endif #endif
#if (REGLEV == 0 && (PSTR == 1 || PSTR == 2)) #if (REGLEV == 0 && (PSTR == 1 || PSTR == 2))
@@ -2232,23 +2222,21 @@ void bssn_class::Evolve(int Steps)
// fgt(PhysTime-dT_mon,StartTime,dT_mon/2),ErrorMonitor); // fgt(PhysTime-dT_mon,StartTime,dT_mon/2),ErrorMonitor);
#endif #endif
#if BSSN_ENABLE_MEM_USAGE_LOG // Retrieve memory usage information used during computation; master process prints it
// Retrieve memory usage information used during computation; master process prints it bssn_perf.MemoryUsage(&current_min, &current_avg, &current_max,
bssn_perf.MemoryUsage(&current_min, &current_avg, &current_max, &peak_min, &peak_avg, &peak_max, nprocs);
&peak_min, &peak_avg, &peak_max, nprocs); if (myrank == 0)
if (myrank == 0) {
{ printf(" Memory usage: current %0.4lg/%0.4lg/%0.4lgMB, "
printf(" Memory usage: current %0.4lg/%0.4lg/%0.4lgMB, " "peak %0.4lg/%0.4lg/%0.4lgMB\n",
"peak %0.4lg/%0.4lg/%0.4lgMB\n", (double)current_min / (1024.0 * 1024.0),
(double)current_min / (1024.0 * 1024.0), (double)current_avg / (1024.0 * 1024.0),
(double)current_avg / (1024.0 * 1024.0), (double)current_max / (1024.0 * 1024.0),
(double)current_max / (1024.0 * 1024.0), (double)peak_min / (1024.0 * 1024.0),
(double)peak_min / (1024.0 * 1024.0), (double)peak_avg / (1024.0 * 1024.0),
(double)peak_avg / (1024.0 * 1024.0), (double)peak_max / (1024.0 * 1024.0));
(double)peak_max / (1024.0 * 1024.0)); cout << endl;
cout << endl; }
}
#endif
// Output puncture positions at each step // Output puncture positions at each step
if (myrank == 0) if (myrank == 0)
@@ -2441,7 +2429,7 @@ void bssn_class::RecursiveStep(int lev)
if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
#endif #endif
} }
@@ -2620,7 +2608,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(GH->mylev, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(GH->mylev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
#endif #endif
} }
@@ -2787,7 +2775,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev + 1, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev + 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_levp1, StartTime, dT_levp1 / 2), ErrorMonitor)) fgt(PhysTime - dT_levp1, StartTime, dT_levp1 / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2802,7 +2790,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_lev / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2821,7 +2809,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -2837,7 +2825,7 @@ void bssn_class::ParallelStep()
if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0, if (GH->Regrid_Onelevel(lev - 1, Symmetry, BH_num, Porgbr, Porg0,
SynchList_cor, OldStateList, StateList, SynchList_pre, SynchList_cor, OldStateList, StateList, SynchList_pre,
fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor)) fgt(PhysTime - dT_lev, StartTime, dT_levm1 / 2), ErrorMonitor))
for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); sync_cache_restrict[il].invalidate(); sync_cache_outbd[il].invalidate(); } for (int il = 0; il < GH->levels; il++) { sync_cache_pre[il].invalidate(); sync_cache_cor[il].invalidate(); sync_cache_rp_coarse[il].invalidate(); sync_cache_rp_fine[il].invalidate(); }
// a_stream.clear(); // a_stream.clear();
// a_stream.str(""); // a_stream.str("");
@@ -5808,7 +5796,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry, sync_cache_restrict[lev]); Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry);
@@ -5832,7 +5820,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif #endif
@@ -5859,7 +5847,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#endif #endif
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_restrict[lev]); Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry);
@@ -5883,7 +5871,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif #endif
@@ -5952,7 +5940,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
} }
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry, sync_cache_restrict[lev]); Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, Symmetry);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SynchList_pre,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SynchList_pre, GH->rsul[lev], Symmetry);
@@ -5962,7 +5950,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SL, Symmetry);
#endif #endif
@@ -5974,7 +5962,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
else // no time refinement levels and for all same time levels else // no time refinement levels and for all same time levels
{ {
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_restrict[lev]); Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SL,SL,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, GH->rsul[lev], Symmetry);
@@ -5984,7 +5972,7 @@ void bssn_class::RestrictProlong_aux(int lev, int YN, bool BB,
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SL, SL, Symmetry);
#endif #endif
@@ -6039,7 +6027,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
} }
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, Symmetry, sync_cache_restrict[lev]); Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, Symmetry);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,SynchList_pre,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,SynchList_pre,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, SynchList_pre, GH->rsul[lev], Symmetry);
@@ -6049,7 +6037,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif #endif
@@ -6063,7 +6051,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
if (myrank == 0) if (myrank == 0)
cout << "===: " << GH->Lt[lev - 1] << "," << GH->Lt[lev] + dT_lev << endl; cout << "===: " << GH->Lt[lev - 1] << "," << GH->Lt[lev] + dT_lev << endl;
#if (RPB == 0) #if (RPB == 0)
Parallel::Restrict_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, Symmetry, sync_cache_restrict[lev]); Parallel::Restrict(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, Symmetry);
#elif (RPB == 1) #elif (RPB == 1)
// Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,StateList,Symmetry); // Parallel::Restrict_bam(GH->PatL[lev-1],GH->PatL[lev],SynchList_cor,StateList,Symmetry);
Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, GH->rsul[lev], Symmetry); Parallel::Restrict_bam(GH->PatL[lev - 1], GH->PatL[lev], SynchList_cor, StateList, GH->rsul[lev], Symmetry);
@@ -6073,7 +6061,7 @@ void bssn_class::RestrictProlong(int lev, int YN, bool BB)
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif #endif
@@ -6114,7 +6102,7 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], SynchList_pre, SynchList_cor, Symmetry);
#endif #endif
@@ -6127,7 +6115,7 @@ void bssn_class::ProlongRestrict(int lev, int YN, bool BB)
{ {
#if (RPB == 0) #if (RPB == 0)
#if (MIXOUTB == 0) #if (MIXOUTB == 0)
Parallel::OutBdLow2Hi_cached(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry, sync_cache_outbd[lev]); Parallel::OutBdLow2Hi(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#elif (MIXOUTB == 1) #elif (MIXOUTB == 1)
Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry); Parallel::OutBdLow2Himix(GH->PatL[lev - 1], GH->PatL[lev], StateList, SynchList_cor, Symmetry);
#endif #endif

View File

@@ -130,8 +130,6 @@ public:
Parallel::SyncCache *sync_cache_cor; // per-level cache for corrector sync Parallel::SyncCache *sync_cache_cor; // per-level cache for corrector sync
Parallel::SyncCache *sync_cache_rp_coarse; // RestrictProlong sync on PatL[lev-1] Parallel::SyncCache *sync_cache_rp_coarse; // RestrictProlong sync on PatL[lev-1]
Parallel::SyncCache *sync_cache_rp_fine; // RestrictProlong sync on PatL[lev] Parallel::SyncCache *sync_cache_rp_fine; // RestrictProlong sync on PatL[lev]
Parallel::SyncCache *sync_cache_restrict; // cached Restrict in RestrictProlong
Parallel::SyncCache *sync_cache_outbd; // cached OutBdLow2Hi in RestrictProlong
monitor *ErrorMonitor, *Psi4Monitor, *BHMonitor, *MAPMonitor; monitor *ErrorMonitor, *Psi4Monitor, *BHMonitor, *MAPMonitor;
monitor *ConVMonitor; monitor *ConVMonitor;

View File

@@ -59,10 +59,9 @@
real*8, dimension(ex(1),ex(2),ex(3)),intent(out) :: Rxx,Rxy,Rxz,Ryy,Ryz,Rzz real*8, dimension(ex(1),ex(2),ex(3)),intent(out) :: Rxx,Rxy,Rxz,Ryy,Ryz,Rzz
real*8,intent(in) :: eps real*8,intent(in) :: eps
real*8, dimension(ex(1),ex(2),ex(3)),intent(inout) :: ham_Res, movx_Res, movy_Res, movz_Res real*8, dimension(ex(1),ex(2),ex(3)),intent(inout) :: ham_Res, movx_Res, movy_Res, movz_Res
real*8, dimension(ex(1),ex(2),ex(3)),intent(inout) :: Gmx_Res, Gmy_Res, Gmz_Res real*8, dimension(ex(1),ex(2),ex(3)),intent(inout) :: Gmx_Res, Gmy_Res, Gmz_Res
! gont = 0: success; gont = 1: something wrong ! gont = 0: success; gont = 1: something wrong
integer::gont integer::gont
integer :: i,j,k
!~~~~~~> Other variables: !~~~~~~> Other variables:
@@ -84,18 +83,11 @@
real*8, dimension(ex(1),ex(2),ex(3)) :: gupxx,gupxy,gupxz real*8, dimension(ex(1),ex(2),ex(3)) :: gupxx,gupxy,gupxz
real*8, dimension(ex(1),ex(2),ex(3)) :: gupyy,gupyz,gupzz real*8, dimension(ex(1),ex(2),ex(3)) :: gupyy,gupyz,gupzz
real*8,dimension(3) ::SSS,AAS,ASA,SAA,ASS,SAS,SSA real*8,dimension(3) ::SSS,AAS,ASA,SAA,ASS,SAS,SSA
real*8 :: dX, dY, dZ, PI real*8 :: dX, dY, dZ, PI
real*8 :: divb_loc,det_loc real*8, parameter :: ZEO = 0.d0,ONE = 1.D0, TWO = 2.D0, FOUR = 4.D0
real*8 :: gupxx_loc,gupxy_loc,gupxz_loc,gupyy_loc,gupyz_loc,gupzz_loc real*8, parameter :: EIGHT = 8.D0, HALF = 0.5D0, THR = 3.d0
real*8 :: Rxx_loc,Rxy_loc,Rxz_loc,Ryy_loc,Ryz_loc,Rzz_loc real*8, parameter :: SYM = 1.D0, ANTI= - 1.D0
real*8 :: fxx_loc,fxy_loc,fxz_loc
real*8 :: Gamxa_loc,Gamya_loc,Gamza_loc
real*8 :: f_loc,chin_loc
real*8 :: l_fxx,l_fxy,l_fxz,l_fyy,l_fyz,l_fzz,S_loc
real*8, parameter :: ZEO = 0.d0,ONE = 1.D0, TWO = 2.D0, FOUR = 4.D0
real*8, parameter :: EIGHT = 8.D0, HALF = 0.5D0, THR = 3.d0
real*8, parameter :: SYM = 1.D0, ANTI= - 1.D0
double precision,parameter::FF = 0.75d0,eta=2.d0 double precision,parameter::FF = 0.75d0,eta=2.d0
real*8, parameter :: F1o3 = 1.D0/3.D0, F2o3 = 2.D0/3.D0,F3o2=1.5d0, F1o6 = 1.D0/6.D0 real*8, parameter :: F1o3 = 1.D0/3.D0, F2o3 = 2.D0/3.D0,F3o2=1.5d0, F1o6 = 1.D0/6.D0
real*8, parameter :: F16=1.6d1,F8=8.d0 real*8, parameter :: F16=1.6d1,F8=8.d0
@@ -104,11 +96,11 @@
real*8, dimension(ex(1),ex(2),ex(3)) :: reta real*8, dimension(ex(1),ex(2),ex(3)) :: reta
#endif #endif
#if (GAUGE == 6 || GAUGE == 7) #if (GAUGE == 6 || GAUGE == 7)
integer :: BHN integer :: BHN,i,j,k
real*8, dimension(9) :: Porg real*8, dimension(9) :: Porg
real*8, dimension(3) :: Mass real*8, dimension(3) :: Mass
real*8 :: r1,r2,M,A,w1,w2,C1,C2 real*8 :: r1,r2,M,A,w1,w2,C1,C2
real*8, dimension(ex(1),ex(2),ex(3)) :: reta real*8, dimension(ex(1),ex(2),ex(3)) :: reta
call getpbh(BHN,Porg,Mass) call getpbh(BHN,Porg,Mass)
@@ -153,204 +145,174 @@
dY = Y(2) - Y(1) dY = Y(2) - Y(1)
dZ = Z(2) - Z(1) dZ = Z(2) - Z(1)
do k=1,ex(3) alpn1 = Lap + ONE
do j=1,ex(2) chin1 = chi + ONE
do i=1,ex(1) gxx = dxx + ONE
alpn1(i,j,k) = Lap(i,j,k) + ONE gyy = dyy + ONE
chin1(i,j,k) = chi(i,j,k) + ONE gzz = dzz + ONE
gxx(i,j,k) = dxx(i,j,k) + ONE
gyy(i,j,k) = dyy(i,j,k) + ONE
gzz(i,j,k) = dzz(i,j,k) + ONE
enddo
enddo
enddo
call fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev) call fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev)
call fderivs(ex,betay,betayx,betayy,betayz,X,Y,Z, SYM,ANTI, SYM,Symmetry,Lev) call fderivs(ex,betay,betayx,betayy,betayz,X,Y,Z, SYM,ANTI, SYM,Symmetry,Lev)
call fderivs(ex,betaz,betazx,betazy,betazz,X,Y,Z, SYM, SYM,ANTI,Symmetry,Lev) call fderivs(ex,betaz,betazx,betazy,betazz,X,Y,Z, SYM, SYM,ANTI,Symmetry,Lev)
call fderivs(ex,chi,chix,chiy,chiz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev) div_beta = betaxx + betayy + betazz
call fderivs(ex,chi,chix,chiy,chiz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev)
call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) chi_rhs = F2o3 *chin1*( alpn1 * trK - div_beta ) !rhs for chi
call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev)
call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev) call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,dyy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev)
call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev) call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev)
call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs(ex,dyy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev)
do k=1,ex(3) call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev)
do j=1,ex(2)
do i=1,ex(1) gxx_rhs = - TWO * alpn1 * Axx - F2o3 * gxx * div_beta + &
divb_loc = betaxx(i,j,k) + betayy(i,j,k) + betazz(i,j,k) TWO *( gxx * betaxx + gxy * betayx + gxz * betazx)
div_beta(i,j,k) = divb_loc
gyy_rhs = - TWO * alpn1 * Ayy - F2o3 * gyy * div_beta + &
chi_rhs(i,j,k) = F2o3 * chin1(i,j,k) * (alpn1(i,j,k) * trK(i,j,k) - divb_loc) TWO *( gxy * betaxy + gyy * betayy + gyz * betazy)
gxx_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Axx(i,j,k) - F2o3 * gxx(i,j,k) * divb_loc + & gzz_rhs = - TWO * alpn1 * Azz - F2o3 * gzz * div_beta + &
TWO * ( gxx(i,j,k) * betaxx(i,j,k) + gxy(i,j,k) * betayx(i,j,k) + gxz(i,j,k) * betazx(i,j,k) ) TWO *( gxz * betaxz + gyz * betayz + gzz * betazz)
gyy_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Ayy(i,j,k) - F2o3 * gyy(i,j,k) * divb_loc + & gxy_rhs = - TWO * alpn1 * Axy + F1o3 * gxy * div_beta + &
TWO * ( gxy(i,j,k) * betaxy(i,j,k) + gyy(i,j,k) * betayy(i,j,k) + gyz(i,j,k) * betazy(i,j,k) ) gxx * betaxy + gxz * betazy + &
gyy * betayx + gyz * betazx &
gzz_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Azz(i,j,k) - F2o3 * gzz(i,j,k) * divb_loc + & - gxy * betazz
TWO * ( gxz(i,j,k) * betaxz(i,j,k) + gyz(i,j,k) * betayz(i,j,k) + gzz(i,j,k) * betazz(i,j,k) )
gyz_rhs = - TWO * alpn1 * Ayz + F1o3 * gyz * div_beta + &
gxy_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Axy(i,j,k) + F1o3 * gxy(i,j,k) * divb_loc + & gxy * betaxz + gyy * betayz + &
gxx(i,j,k) * betaxy(i,j,k) + gxz(i,j,k) * betazy(i,j,k) + gyy(i,j,k) * betayx(i,j,k) + & gxz * betaxy + gzz * betazy &
gyz(i,j,k) * betazx(i,j,k) - gxy(i,j,k) * betazz(i,j,k) - gyz * betaxx
gyz_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Ayz(i,j,k) + F1o3 * gyz(i,j,k) * divb_loc + & gxz_rhs = - TWO * alpn1 * Axz + F1o3 * gxz * div_beta + &
gxy(i,j,k) * betaxz(i,j,k) + gyy(i,j,k) * betayz(i,j,k) + gxz(i,j,k) * betaxy(i,j,k) + & gxx * betaxz + gxy * betayz + &
gzz(i,j,k) * betazy(i,j,k) - gyz(i,j,k) * betaxx(i,j,k) gyz * betayx + gzz * betazx &
- gxz * betayy !rhs for gij
gxz_rhs(i,j,k) = - TWO * alpn1(i,j,k) * Axz(i,j,k) + F1o3 * gxz(i,j,k) * divb_loc + &
gxx(i,j,k) * betaxz(i,j,k) + gxy(i,j,k) * betayz(i,j,k) + gyz(i,j,k) * betayx(i,j,k) + & ! invert tilted metric
gzz(i,j,k) * betazx(i,j,k) - gxz(i,j,k) * betayy(i,j,k) gupzz = gxx * gyy * gzz + gxy * gyz * gxz + gxz * gxy * gyz - &
gxz * gyy * gxz - gxy * gxy * gzz - gxx * gyz * gyz
det_loc = gxx(i,j,k) * gyy(i,j,k) * gzz(i,j,k) + gxy(i,j,k) * gyz(i,j,k) * gxz(i,j,k) + & gupxx = ( gyy * gzz - gyz * gyz ) / gupzz
gxz(i,j,k) * gxy(i,j,k) * gyz(i,j,k) - gxz(i,j,k) * gyy(i,j,k) * gxz(i,j,k) - & gupxy = - ( gxy * gzz - gyz * gxz ) / gupzz
gxy(i,j,k) * gxy(i,j,k) * gzz(i,j,k) - gxx(i,j,k) * gyz(i,j,k) * gyz(i,j,k) gupxz = ( gxy * gyz - gyy * gxz ) / gupzz
gupxx_loc = ( gyy(i,j,k) * gzz(i,j,k) - gyz(i,j,k) * gyz(i,j,k) ) / det_loc gupyy = ( gxx * gzz - gxz * gxz ) / gupzz
gupxy_loc = - ( gxy(i,j,k) * gzz(i,j,k) - gyz(i,j,k) * gxz(i,j,k) ) / det_loc gupyz = - ( gxx * gyz - gxy * gxz ) / gupzz
gupxz_loc = ( gxy(i,j,k) * gyz(i,j,k) - gyy(i,j,k) * gxz(i,j,k) ) / det_loc gupzz = ( gxx * gyy - gxy * gxy ) / gupzz
gupyy_loc = ( gxx(i,j,k) * gzz(i,j,k) - gxz(i,j,k) * gxz(i,j,k) ) / det_loc
gupyz_loc = - ( gxx(i,j,k) * gyz(i,j,k) - gxy(i,j,k) * gxz(i,j,k) ) / det_loc if(co == 0)then
gupzz_loc = ( gxx(i,j,k) * gyy(i,j,k) - gxy(i,j,k) * gxy(i,j,k) ) / det_loc ! Gam^i_Res = Gam^i + gup^ij_,j
gupxx(i,j,k) = gupxx_loc Gmx_Res = Gamx - (gupxx*(gupxx*gxxx+gupxy*gxyx+gupxz*gxzx)&
gupxy(i,j,k) = gupxy_loc +gupxy*(gupxx*gxyx+gupxy*gyyx+gupxz*gyzx)&
gupxz(i,j,k) = gupxz_loc +gupxz*(gupxx*gxzx+gupxy*gyzx+gupxz*gzzx)&
gupyy(i,j,k) = gupyy_loc +gupxx*(gupxy*gxxy+gupyy*gxyy+gupyz*gxzy)&
gupyz(i,j,k) = gupyz_loc +gupxy*(gupxy*gxyy+gupyy*gyyy+gupyz*gyzy)&
gupzz(i,j,k) = gupzz_loc +gupxz*(gupxy*gxzy+gupyy*gyzy+gupyz*gzzy)&
+gupxx*(gupxz*gxxz+gupyz*gxyz+gupzz*gxzz)&
if(co == 0)then +gupxy*(gupxz*gxyz+gupyz*gyyz+gupzz*gyzz)&
Gmx_Res(i,j,k) = Gamx(i,j,k) - ( & +gupxz*(gupxz*gxzz+gupyz*gyzz+gupzz*gzzz))
gupxx_loc*(gupxx_loc*gxxx(i,j,k)+gupxy_loc*gxyx(i,j,k)+gupxz_loc*gxzx(i,j,k)) + & Gmy_Res = Gamy - (gupxx*(gupxy*gxxx+gupyy*gxyx+gupyz*gxzx)&
gupxy_loc*(gupxx_loc*gxyx(i,j,k)+gupxy_loc*gyyx(i,j,k)+gupxz_loc*gyzx(i,j,k)) + & +gupxy*(gupxy*gxyx+gupyy*gyyx+gupyz*gyzx)&
gupxz_loc*(gupxx_loc*gxzx(i,j,k)+gupxy_loc*gyzx(i,j,k)+gupxz_loc*gzzx(i,j,k)) + & +gupxz*(gupxy*gxzx+gupyy*gyzx+gupyz*gzzx)&
gupxx_loc*(gupxy_loc*gxxy(i,j,k)+gupyy_loc*gxyy(i,j,k)+gupyz_loc*gxzy(i,j,k)) + & +gupxy*(gupxy*gxxy+gupyy*gxyy+gupyz*gxzy)&
gupxy_loc*(gupxy_loc*gxyy(i,j,k)+gupyy_loc*gyyy(i,j,k)+gupyz_loc*gyzy(i,j,k)) + & +gupyy*(gupxy*gxyy+gupyy*gyyy+gupyz*gyzy)&
gupxz_loc*(gupxy_loc*gxzy(i,j,k)+gupyy_loc*gyzy(i,j,k)+gupyz_loc*gzzy(i,j,k)) + & +gupyz*(gupxy*gxzy+gupyy*gyzy+gupyz*gzzy)&
gupxx_loc*(gupxz_loc*gxxz(i,j,k)+gupyz_loc*gxyz(i,j,k)+gupzz_loc*gxzz(i,j,k)) + & +gupxy*(gupxz*gxxz+gupyz*gxyz+gupzz*gxzz)&
gupxy_loc*(gupxz_loc*gxyz(i,j,k)+gupyz_loc*gyyz(i,j,k)+gupzz_loc*gyzz(i,j,k)) + & +gupyy*(gupxz*gxyz+gupyz*gyyz+gupzz*gyzz)&
gupxz_loc*(gupxz_loc*gxzz(i,j,k)+gupyz_loc*gyzz(i,j,k)+gupzz_loc*gzzz(i,j,k))) +gupyz*(gupxz*gxzz+gupyz*gyzz+gupzz*gzzz))
Gmy_Res(i,j,k) = Gamy(i,j,k) - ( & Gmz_Res = Gamz - (gupxx*(gupxz*gxxx+gupyz*gxyx+gupzz*gxzx)&
gupxx_loc*(gupxy_loc*gxxx(i,j,k)+gupyy_loc*gxyx(i,j,k)+gupyz_loc*gxzx(i,j,k)) + & +gupxy*(gupxz*gxyx+gupyz*gyyx+gupzz*gyzx)&
gupxy_loc*(gupxy_loc*gxyx(i,j,k)+gupyy_loc*gyyx(i,j,k)+gupyz_loc*gyzx(i,j,k)) + & +gupxz*(gupxz*gxzx+gupyz*gyzx+gupzz*gzzx)&
gupxz_loc*(gupxy_loc*gxzx(i,j,k)+gupyy_loc*gyzx(i,j,k)+gupyz_loc*gzzx(i,j,k)) + & +gupxy*(gupxz*gxxy+gupyz*gxyy+gupzz*gxzy)&
gupxy_loc*(gupxy_loc*gxxy(i,j,k)+gupyy_loc*gxyy(i,j,k)+gupyz_loc*gxzy(i,j,k)) + & +gupyy*(gupxz*gxyy+gupyz*gyyy+gupzz*gyzy)&
gupyy_loc*(gupxy_loc*gxyy(i,j,k)+gupyy_loc*gyyy(i,j,k)+gupyz_loc*gyzy(i,j,k)) + & +gupyz*(gupxz*gxzy+gupyz*gyzy+gupzz*gzzy)&
gupyz_loc*(gupxy_loc*gxzy(i,j,k)+gupyy_loc*gyzy(i,j,k)+gupyz_loc*gzzy(i,j,k)) + & +gupxz*(gupxz*gxxz+gupyz*gxyz+gupzz*gxzz)&
gupxy_loc*(gupxz_loc*gxxz(i,j,k)+gupyz_loc*gxyz(i,j,k)+gupzz_loc*gxzz(i,j,k)) + & +gupyz*(gupxz*gxyz+gupyz*gyyz+gupzz*gyzz)&
gupyy_loc*(gupxz_loc*gxyz(i,j,k)+gupyz_loc*gyyz(i,j,k)+gupzz_loc*gyzz(i,j,k)) + & +gupzz*(gupxz*gxzz+gupyz*gyzz+gupzz*gzzz))
gupyz_loc*(gupxz_loc*gxzz(i,j,k)+gupyz_loc*gyzz(i,j,k)+gupzz_loc*gzzz(i,j,k))) endif
Gmz_Res(i,j,k) = Gamz(i,j,k) - ( &
gupxx_loc*(gupxz_loc*gxxx(i,j,k)+gupyz_loc*gxyx(i,j,k)+gupzz_loc*gxzx(i,j,k)) + & ! second kind of connection
gupxy_loc*(gupxz_loc*gxyx(i,j,k)+gupyz_loc*gyyx(i,j,k)+gupzz_loc*gyzx(i,j,k)) + & Gamxxx =HALF*( gupxx*gxxx + gupxy*(TWO*gxyx - gxxy ) + gupxz*(TWO*gxzx - gxxz ))
gupxz_loc*(gupxz_loc*gxzx(i,j,k)+gupyz_loc*gyzx(i,j,k)+gupzz_loc*gzzx(i,j,k)) + & Gamyxx =HALF*( gupxy*gxxx + gupyy*(TWO*gxyx - gxxy ) + gupyz*(TWO*gxzx - gxxz ))
gupxy_loc*(gupxz_loc*gxxy(i,j,k)+gupyz_loc*gxyy(i,j,k)+gupzz_loc*gxzy(i,j,k)) + & Gamzxx =HALF*( gupxz*gxxx + gupyz*(TWO*gxyx - gxxy ) + gupzz*(TWO*gxzx - gxxz ))
gupyy_loc*(gupxz_loc*gxyy(i,j,k)+gupyz_loc*gyyy(i,j,k)+gupzz_loc*gyzy(i,j,k)) + &
gupyz_loc*(gupxz_loc*gxzy(i,j,k)+gupyz_loc*gyzy(i,j,k)+gupzz_loc*gzzy(i,j,k)) + & Gamxyy =HALF*( gupxx*(TWO*gxyy - gyyx ) + gupxy*gyyy + gupxz*(TWO*gyzy - gyyz ))
gupxz_loc*(gupxz_loc*gxxz(i,j,k)+gupyz_loc*gxyz(i,j,k)+gupzz_loc*gxzz(i,j,k)) + & Gamyyy =HALF*( gupxy*(TWO*gxyy - gyyx ) + gupyy*gyyy + gupyz*(TWO*gyzy - gyyz ))
gupyz_loc*(gupxz_loc*gxyz(i,j,k)+gupyz_loc*gyyz(i,j,k)+gupzz_loc*gyzz(i,j,k)) + & Gamzyy =HALF*( gupxz*(TWO*gxyy - gyyx ) + gupyz*gyyy + gupzz*(TWO*gyzy - gyyz ))
gupzz_loc*(gupxz_loc*gxzz(i,j,k)+gupyz_loc*gyzz(i,j,k)+gupzz_loc*gzzz(i,j,k)))
endif Gamxzz =HALF*( gupxx*(TWO*gxzz - gzzx ) + gupxy*(TWO*gyzz - gzzy ) + gupxz*gzzz)
Gamyzz =HALF*( gupxy*(TWO*gxzz - gzzx ) + gupyy*(TWO*gyzz - gzzy ) + gupyz*gzzz)
Gamxxx(i,j,k)=HALF*( gupxx_loc*gxxx(i,j,k) + gupxy_loc*(TWO*gxyx(i,j,k) - gxxy(i,j,k)) + gupxz_loc*(TWO*gxzx(i,j,k) - gxxz(i,j,k))) Gamzzz =HALF*( gupxz*(TWO*gxzz - gzzx ) + gupyz*(TWO*gyzz - gzzy ) + gupzz*gzzz)
Gamyxx(i,j,k)=HALF*( gupxy_loc*gxxx(i,j,k) + gupyy_loc*(TWO*gxyx(i,j,k) - gxxy(i,j,k)) + gupyz_loc*(TWO*gxzx(i,j,k) - gxxz(i,j,k)))
Gamzxx(i,j,k)=HALF*( gupxz_loc*gxxx(i,j,k) + gupyz_loc*(TWO*gxyx(i,j,k) - gxxy(i,j,k)) + gupzz_loc*(TWO*gxzx(i,j,k) - gxxz(i,j,k))) Gamxxy =HALF*( gupxx*gxxy + gupxy*gyyx + gupxz*( gxzy + gyzx - gxyz ) )
Gamyxy =HALF*( gupxy*gxxy + gupyy*gyyx + gupyz*( gxzy + gyzx - gxyz ) )
Gamxyy(i,j,k)=HALF*( gupxx_loc*(TWO*gxyy(i,j,k) - gyyx(i,j,k)) + gupxy_loc*gyyy(i,j,k) + gupxz_loc*(TWO*gyzy(i,j,k) - gyyz(i,j,k))) Gamzxy =HALF*( gupxz*gxxy + gupyz*gyyx + gupzz*( gxzy + gyzx - gxyz ) )
Gamyyy(i,j,k)=HALF*( gupxy_loc*(TWO*gxyy(i,j,k) - gyyx(i,j,k)) + gupyy_loc*gyyy(i,j,k) + gupyz_loc*(TWO*gyzy(i,j,k) - gyyz(i,j,k)))
Gamzyy(i,j,k)=HALF*( gupxz_loc*(TWO*gxyy(i,j,k) - gyyx(i,j,k)) + gupyz_loc*gyyy(i,j,k) + gupzz_loc*(TWO*gyzy(i,j,k) - gyyz(i,j,k))) Gamxxz =HALF*( gupxx*gxxz + gupxy*( gxyz + gyzx - gxzy ) + gupxz*gzzx )
Gamyxz =HALF*( gupxy*gxxz + gupyy*( gxyz + gyzx - gxzy ) + gupyz*gzzx )
Gamxzz(i,j,k)=HALF*( gupxx_loc*(TWO*gxzz(i,j,k) - gzzx(i,j,k)) + gupxy_loc*(TWO*gyzz(i,j,k) - gzzy(i,j,k)) + gupxz_loc*gzzz(i,j,k)) Gamzxz =HALF*( gupxz*gxxz + gupyz*( gxyz + gyzx - gxzy ) + gupzz*gzzx )
Gamyzz(i,j,k)=HALF*( gupxy_loc*(TWO*gxzz(i,j,k) - gzzx(i,j,k)) + gupyy_loc*(TWO*gyzz(i,j,k) - gzzy(i,j,k)) + gupyz_loc*gzzz(i,j,k))
Gamzzz(i,j,k)=HALF*( gupxz_loc*(TWO*gxzz(i,j,k) - gzzx(i,j,k)) + gupyz_loc*(TWO*gyzz(i,j,k) - gzzy(i,j,k)) + gupzz_loc*gzzz(i,j,k)) Gamxyz =HALF*( gupxx*( gxyz + gxzy - gyzx ) + gupxy*gyyz + gupxz*gzzy )
Gamyyz =HALF*( gupxy*( gxyz + gxzy - gyzx ) + gupyy*gyyz + gupyz*gzzy )
Gamxxy(i,j,k)=HALF*( gupxx_loc*gxxy(i,j,k) + gupxy_loc*gyyx(i,j,k) + gupxz_loc*(gxzy(i,j,k) + gyzx(i,j,k) - gxyz(i,j,k)) ) Gamzyz =HALF*( gupxz*( gxyz + gxzy - gyzx ) + gupyz*gyyz + gupzz*gzzy )
Gamyxy(i,j,k)=HALF*( gupxy_loc*gxxy(i,j,k) + gupyy_loc*gyyx(i,j,k) + gupyz_loc*(gxzy(i,j,k) + gyzx(i,j,k) - gxyz(i,j,k)) ) ! Raise indices of \tilde A_{ij} and store in R_ij
Gamzxy(i,j,k)=HALF*( gupxz_loc*gxxy(i,j,k) + gupyz_loc*gyyx(i,j,k) + gupzz_loc*(gxzy(i,j,k) + gyzx(i,j,k) - gxyz(i,j,k)) )
Rxx = gupxx * gupxx * Axx + gupxy * gupxy * Ayy + gupxz * gupxz * Azz + &
Gamxxz(i,j,k)=HALF*( gupxx_loc*gxxz(i,j,k) + gupxy_loc*(gxyz(i,j,k) + gyzx(i,j,k) - gxzy(i,j,k)) + gupxz_loc*gzzx(i,j,k) ) TWO*(gupxx * gupxy * Axy + gupxx * gupxz * Axz + gupxy * gupxz * Ayz)
Gamyxz(i,j,k)=HALF*( gupxy_loc*gxxz(i,j,k) + gupyy_loc*(gxyz(i,j,k) + gyzx(i,j,k) - gxzy(i,j,k)) + gupyz_loc*gzzx(i,j,k) )
Gamzxz(i,j,k)=HALF*( gupxz_loc*gxxz(i,j,k) + gupyz_loc*(gxyz(i,j,k) + gyzx(i,j,k) - gxzy(i,j,k)) + gupzz_loc*gzzx(i,j,k) ) Ryy = gupxy * gupxy * Axx + gupyy * gupyy * Ayy + gupyz * gupyz * Azz + &
TWO*(gupxy * gupyy * Axy + gupxy * gupyz * Axz + gupyy * gupyz * Ayz)
Gamxyz(i,j,k)=HALF*( gupxx_loc*(gxyz(i,j,k) + gxzy(i,j,k) - gyzx(i,j,k)) + gupxy_loc*gyyz(i,j,k) + gupxz_loc*gzzy(i,j,k) )
Gamyyz(i,j,k)=HALF*( gupxy_loc*(gxyz(i,j,k) + gxzy(i,j,k) - gyzx(i,j,k)) + gupyy_loc*gyyz(i,j,k) + gupyz_loc*gzzy(i,j,k) ) Rzz = gupxz * gupxz * Axx + gupyz * gupyz * Ayy + gupzz * gupzz * Azz + &
Gamzyz(i,j,k)=HALF*( gupxz_loc*(gxyz(i,j,k) + gxzy(i,j,k) - gyzx(i,j,k)) + gupyz_loc*gyyz(i,j,k) + gupzz_loc*gzzy(i,j,k) ) TWO*(gupxz * gupyz * Axy + gupxz * gupzz * Axz + gupyz * gupzz * Ayz)
enddo
enddo Rxy = gupxx * gupxy * Axx + gupxy * gupyy * Ayy + gupxz * gupyz * Azz + &
enddo (gupxx * gupyy + gupxy * gupxy)* Axy + &
! Raise indices of \tilde A_{ij} and store in R_ij (gupxx * gupyz + gupxz * gupxy)* Axz + &
(gupxy * gupyz + gupxz * gupyy)* Ayz
! Right hand side for Gam^i without shift terms...
call fderivs(ex,Lap,Lapx,Lapy,Lapz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) Rxz = gupxx * gupxz * Axx + gupxy * gupyz * Ayy + gupxz * gupzz * Azz + &
call fderivs(ex,trK,Kx,Ky,Kz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev) (gupxx * gupyz + gupxy * gupxz)* Axy + &
do k=1,ex(3) (gupxx * gupzz + gupxz * gupxz)* Axz + &
do j=1,ex(2) (gupxy * gupzz + gupxz * gupyz)* Ayz
do i=1,ex(1)
gupxx_loc = gupxx(i,j,k) Ryz = gupxy * gupxz * Axx + gupyy * gupyz * Ayy + gupyz * gupzz * Azz + &
gupxy_loc = gupxy(i,j,k) (gupxy * gupyz + gupyy * gupxz)* Axy + &
gupxz_loc = gupxz(i,j,k) (gupxy * gupzz + gupyz * gupxz)* Axz + &
gupyy_loc = gupyy(i,j,k) (gupyy * gupzz + gupyz * gupyz)* Ayz
gupyz_loc = gupyz(i,j,k)
gupzz_loc = gupzz(i,j,k) ! Right hand side for Gam^i without shift terms...
call fderivs(ex,Lap,Lapx,Lapy,Lapz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev)
Rxx_loc = gupxx_loc * gupxx_loc * Axx(i,j,k) + gupxy_loc * gupxy_loc * Ayy(i,j,k) + gupxz_loc * gupxz_loc * Azz(i,j,k) + & call fderivs(ex,trK,Kx,Ky,Kz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev)
TWO * (gupxx_loc * gupxy_loc * Axy(i,j,k) + gupxx_loc * gupxz_loc * Axz(i,j,k) + gupxy_loc * gupxz_loc * Ayz(i,j,k))
Ryy_loc = gupxy_loc * gupxy_loc * Axx(i,j,k) + gupyy_loc * gupyy_loc * Ayy(i,j,k) + gupyz_loc * gupyz_loc * Azz(i,j,k) + & Gamx_rhs = - TWO * ( Lapx * Rxx + Lapy * Rxy + Lapz * Rxz ) + &
TWO * (gupxy_loc * gupyy_loc * Axy(i,j,k) + gupxy_loc * gupyz_loc * Axz(i,j,k) + gupyy_loc * gupyz_loc * Ayz(i,j,k)) TWO * alpn1 * ( &
Rzz_loc = gupxz_loc * gupxz_loc * Axx(i,j,k) + gupyz_loc * gupyz_loc * Ayy(i,j,k) + gupzz_loc * gupzz_loc * Azz(i,j,k) + & -F3o2/chin1 * ( chix * Rxx + chiy * Rxy + chiz * Rxz ) - &
TWO * (gupxz_loc * gupyz_loc * Axy(i,j,k) + gupxz_loc * gupzz_loc * Axz(i,j,k) + gupyz_loc * gupzz_loc * Ayz(i,j,k)) gupxx * ( F2o3 * Kx + EIGHT * PI * Sx ) - &
Rxy_loc = gupxx_loc * gupxy_loc * Axx(i,j,k) + gupxy_loc * gupyy_loc * Ayy(i,j,k) + gupxz_loc * gupyz_loc * Azz(i,j,k) + & gupxy * ( F2o3 * Ky + EIGHT * PI * Sy ) - &
(gupxx_loc * gupyy_loc + gupxy_loc * gupxy_loc) * Axy(i,j,k) + & gupxz * ( F2o3 * Kz + EIGHT * PI * Sz ) + &
(gupxx_loc * gupyz_loc + gupxz_loc * gupxy_loc) * Axz(i,j,k) + & Gamxxx * Rxx + Gamxyy * Ryy + Gamxzz * Rzz + &
(gupxy_loc * gupyz_loc + gupxz_loc * gupyy_loc) * Ayz(i,j,k) TWO * ( Gamxxy * Rxy + Gamxxz * Rxz + Gamxyz * Ryz ) )
Rxz_loc = gupxx_loc * gupxz_loc * Axx(i,j,k) + gupxy_loc * gupyz_loc * Ayy(i,j,k) + gupxz_loc * gupzz_loc * Azz(i,j,k) + &
(gupxx_loc * gupyz_loc + gupxy_loc * gupxz_loc) * Axy(i,j,k) + & Gamy_rhs = - TWO * ( Lapx * Rxy + Lapy * Ryy + Lapz * Ryz ) + &
(gupxx_loc * gupzz_loc + gupxz_loc * gupxz_loc) * Axz(i,j,k) + & TWO * alpn1 * ( &
(gupxy_loc * gupzz_loc + gupxz_loc * gupyz_loc) * Ayz(i,j,k) -F3o2/chin1 * ( chix * Rxy + chiy * Ryy + chiz * Ryz ) - &
Ryz_loc = gupxy_loc * gupxz_loc * Axx(i,j,k) + gupyy_loc * gupyz_loc * Ayy(i,j,k) + gupyz_loc * gupzz_loc * Azz(i,j,k) + & gupxy * ( F2o3 * Kx + EIGHT * PI * Sx ) - &
(gupxy_loc * gupyz_loc + gupyy_loc * gupxz_loc) * Axy(i,j,k) + & gupyy * ( F2o3 * Ky + EIGHT * PI * Sy ) - &
(gupxy_loc * gupzz_loc + gupyz_loc * gupxz_loc) * Axz(i,j,k) + & gupyz * ( F2o3 * Kz + EIGHT * PI * Sz ) + &
(gupyy_loc * gupzz_loc + gupyz_loc * gupyz_loc) * Ayz(i,j,k) Gamyxx * Rxx + Gamyyy * Ryy + Gamyzz * Rzz + &
Rxx(i,j,k) = Rxx_loc TWO * ( Gamyxy * Rxy + Gamyxz * Rxz + Gamyyz * Ryz ) )
Ryy(i,j,k) = Ryy_loc
Rzz(i,j,k) = Rzz_loc Gamz_rhs = - TWO * ( Lapx * Rxz + Lapy * Ryz + Lapz * Rzz ) + &
Rxy(i,j,k) = Rxy_loc TWO * alpn1 * ( &
Rxz(i,j,k) = Rxz_loc -F3o2/chin1 * ( chix * Rxz + chiy * Ryz + chiz * Rzz ) - &
Ryz(i,j,k) = Ryz_loc gupxz * ( F2o3 * Kx + EIGHT * PI * Sx ) - &
gupyz * ( F2o3 * Ky + EIGHT * PI * Sy ) - &
Gamx_rhs(i,j,k) = - TWO * (Lapx(i,j,k) * Rxx_loc + Lapy(i,j,k) * Rxy_loc + Lapz(i,j,k) * Rxz_loc) + & gupzz * ( F2o3 * Kz + EIGHT * PI * Sz ) + &
TWO * alpn1(i,j,k) * ( & Gamzxx * Rxx + Gamzyy * Ryy + Gamzzz * Rzz + &
-F3o2/chin1(i,j,k) * (chix(i,j,k) * Rxx_loc + chiy(i,j,k) * Rxy_loc + chiz(i,j,k) * Rxz_loc) - & TWO * ( Gamzxy * Rxy + Gamzxz * Rxz + Gamzyz * Ryz ) )
gupxx_loc * (F2o3 * Kx(i,j,k) + EIGHT * PI * Sx(i,j,k)) - &
gupxy_loc * (F2o3 * Ky(i,j,k) + EIGHT * PI * Sy(i,j,k)) - &
gupxz_loc * (F2o3 * Kz(i,j,k) + EIGHT * PI * Sz(i,j,k)) + &
Gamxxx(i,j,k) * Rxx_loc + Gamxyy(i,j,k) * Ryy_loc + Gamxzz(i,j,k) * Rzz_loc + &
TWO * (Gamxxy(i,j,k) * Rxy_loc + Gamxxz(i,j,k) * Rxz_loc + Gamxyz(i,j,k) * Ryz_loc))
Gamy_rhs(i,j,k) = - TWO * (Lapx(i,j,k) * Rxy_loc + Lapy(i,j,k) * Ryy_loc + Lapz(i,j,k) * Ryz_loc) + &
TWO * alpn1(i,j,k) * ( &
-F3o2/chin1(i,j,k) * (chix(i,j,k) * Rxy_loc + chiy(i,j,k) * Ryy_loc + chiz(i,j,k) * Ryz_loc) - &
gupxy_loc * (F2o3 * Kx(i,j,k) + EIGHT * PI * Sx(i,j,k)) - &
gupyy_loc * (F2o3 * Ky(i,j,k) + EIGHT * PI * Sy(i,j,k)) - &
gupyz_loc * (F2o3 * Kz(i,j,k) + EIGHT * PI * Sz(i,j,k)) + &
Gamyxx(i,j,k) * Rxx_loc + Gamyyy(i,j,k) * Ryy_loc + Gamyzz(i,j,k) * Rzz_loc + &
TWO * (Gamyxy(i,j,k) * Rxy_loc + Gamyxz(i,j,k) * Rxz_loc + Gamyyz(i,j,k) * Ryz_loc))
Gamz_rhs(i,j,k) = - TWO * (Lapx(i,j,k) * Rxz_loc + Lapy(i,j,k) * Ryz_loc + Lapz(i,j,k) * Rzz_loc) + &
TWO * alpn1(i,j,k) * ( &
-F3o2/chin1(i,j,k) * (chix(i,j,k) * Rxz_loc + chiy(i,j,k) * Ryz_loc + chiz(i,j,k) * Rzz_loc) - &
gupxz_loc * (F2o3 * Kx(i,j,k) + EIGHT * PI * Sx(i,j,k)) - &
gupyz_loc * (F2o3 * Ky(i,j,k) + EIGHT * PI * Sy(i,j,k)) - &
gupzz_loc * (F2o3 * Kz(i,j,k) + EIGHT * PI * Sz(i,j,k)) + &
Gamzxx(i,j,k) * Rxx_loc + Gamzyy(i,j,k) * Ryy_loc + Gamzzz(i,j,k) * Rzz_loc + &
TWO * (Gamzxy(i,j,k) * Rxy_loc + Gamzxz(i,j,k) * Rxz_loc + Gamzyz(i,j,k) * Ryz_loc))
enddo
enddo
enddo
call fdderivs(ex,betax,gxxx,gxyx,gxzx,gyyx,gyzx,gzzx,& call fdderivs(ex,betax,gxxx,gxyx,gxzx,gyyx,gyzx,gzzx,&
X,Y,Z,ANTI,SYM, SYM ,Symmetry,Lev) X,Y,Z,ANTI,SYM, SYM ,Symmetry,Lev)
@@ -359,54 +321,38 @@
call fdderivs(ex,betaz,gxxz,gxyz,gxzz,gyyz,gyzz,gzzz,& call fdderivs(ex,betaz,gxxz,gxyz,gxzz,gyyz,gyzz,gzzz,&
X,Y,Z,SYM ,SYM, ANTI,Symmetry,Lev) X,Y,Z,SYM ,SYM, ANTI,Symmetry,Lev)
call fderivs(ex,Gamx,Gamxx,Gamxy,Gamxz,X,Y,Z,ANTI,SYM ,SYM ,Symmetry,Lev) fxx = gxxx + gxyy + gxzz
call fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev) fxy = gxyx + gyyy + gyzz
call fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev) fxz = gxzx + gyzy + gzzz
do k=1,ex(3)
do j=1,ex(2) Gamxa = gupxx * Gamxxx + gupyy * Gamxyy + gupzz * Gamxzz + &
do i=1,ex(1) TWO*( gupxy * Gamxxy + gupxz * Gamxxz + gupyz * Gamxyz )
divb_loc = div_beta(i,j,k) Gamya = gupxx * Gamyxx + gupyy * Gamyyy + gupzz * Gamyzz + &
fxx_loc = gxxx(i,j,k) + gxyy(i,j,k) + gxzz(i,j,k) TWO*( gupxy * Gamyxy + gupxz * Gamyxz + gupyz * Gamyyz )
fxy_loc = gxyx(i,j,k) + gyyy(i,j,k) + gyzz(i,j,k) Gamza = gupxx * Gamzxx + gupyy * Gamzyy + gupzz * Gamzzz + &
fxz_loc = gxzx(i,j,k) + gyzy(i,j,k) + gzzz(i,j,k) TWO*( gupxy * Gamzxy + gupxz * Gamzxz + gupyz * Gamzyz )
gupxx_loc = gupxx(i,j,k) call fderivs(ex,Gamx,Gamxx,Gamxy,Gamxz,X,Y,Z,ANTI,SYM ,SYM ,Symmetry,Lev)
gupxy_loc = gupxy(i,j,k) call fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev)
gupxz_loc = gupxz(i,j,k) call fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev)
gupyy_loc = gupyy(i,j,k)
gupyz_loc = gupyz(i,j,k) Gamx_rhs = Gamx_rhs + F2o3 * Gamxa * div_beta - &
gupzz_loc = gupzz(i,j,k) Gamxa * betaxx - Gamya * betaxy - Gamza * betaxz + &
F1o3 * (gupxx * fxx + gupxy * fxy + gupxz * fxz ) + &
Gamxa_loc = gupxx_loc * Gamxxx(i,j,k) + gupyy_loc * Gamxyy(i,j,k) + gupzz_loc * Gamxzz(i,j,k) + & gupxx * gxxx + gupyy * gyyx + gupzz * gzzx + &
TWO * (gupxy_loc * Gamxxy(i,j,k) + gupxz_loc * Gamxxz(i,j,k) + gupyz_loc * Gamxyz(i,j,k)) TWO * (gupxy * gxyx + gupxz * gxzx + gupyz * gyzx )
Gamya_loc = gupxx_loc * Gamyxx(i,j,k) + gupyy_loc * Gamyyy(i,j,k) + gupzz_loc * Gamyzz(i,j,k) + &
TWO * (gupxy_loc * Gamyxy(i,j,k) + gupxz_loc * Gamyxz(i,j,k) + gupyz_loc * Gamyyz(i,j,k)) Gamy_rhs = Gamy_rhs + F2o3 * Gamya * div_beta - &
Gamza_loc = gupxx_loc * Gamzxx(i,j,k) + gupyy_loc * Gamzyy(i,j,k) + gupzz_loc * Gamzzz(i,j,k) + & Gamxa * betayx - Gamya * betayy - Gamza * betayz + &
TWO * (gupxy_loc * Gamzxy(i,j,k) + gupxz_loc * Gamzxz(i,j,k) + gupyz_loc * Gamzyz(i,j,k)) F1o3 * (gupxy * fxx + gupyy * fxy + gupyz * fxz ) + &
Gamxa(i,j,k) = Gamxa_loc gupxx * gxxy + gupyy * gyyy + gupzz * gzzy + &
Gamya(i,j,k) = Gamya_loc TWO * (gupxy * gxyy + gupxz * gxzy + gupyz * gyzy )
Gamza(i,j,k) = Gamza_loc
Gamz_rhs = Gamz_rhs + F2o3 * Gamza * div_beta - &
Gamx_rhs(i,j,k) = Gamx_rhs(i,j,k) + F2o3 * Gamxa_loc * divb_loc - & Gamxa * betazx - Gamya * betazy - Gamza * betazz + &
Gamxa_loc * betaxx(i,j,k) - Gamya_loc * betaxy(i,j,k) - Gamza_loc * betaxz(i,j,k) + & F1o3 * (gupxz * fxx + gupyz * fxy + gupzz * fxz ) + &
F1o3 * (gupxx_loc * fxx_loc + gupxy_loc * fxy_loc + gupxz_loc * fxz_loc) + & gupxx * gxxz + gupyy * gyyz + gupzz * gzzz + &
gupxx_loc * gxxx(i,j,k) + gupyy_loc * gyyx(i,j,k) + gupzz_loc * gzzx(i,j,k) + & TWO * (gupxy * gxyz + gupxz * gxzz + gupyz * gyzz ) !rhs for Gam^i
TWO * (gupxy_loc * gxyx(i,j,k) + gupxz_loc * gxzx(i,j,k) + gupyz_loc * gyzx(i,j,k))
Gamy_rhs(i,j,k) = Gamy_rhs(i,j,k) + F2o3 * Gamya_loc * divb_loc - &
Gamxa_loc * betayx(i,j,k) - Gamya_loc * betayy(i,j,k) - Gamza_loc * betayz(i,j,k) + &
F1o3 * (gupxy_loc * fxx_loc + gupyy_loc * fxy_loc + gupyz_loc * fxz_loc) + &
gupxx_loc * gxxy(i,j,k) + gupyy_loc * gyyy(i,j,k) + gupzz_loc * gzzy(i,j,k) + &
TWO * (gupxy_loc * gxyy(i,j,k) + gupxz_loc * gxzy(i,j,k) + gupyz_loc * gyzy(i,j,k))
Gamz_rhs(i,j,k) = Gamz_rhs(i,j,k) + F2o3 * Gamza_loc * divb_loc - &
Gamxa_loc * betazx(i,j,k) - Gamya_loc * betazy(i,j,k) - Gamza_loc * betazz(i,j,k) + &
F1o3 * (gupxz_loc * fxx_loc + gupyz_loc * fxy_loc + gupzz_loc * fxz_loc) + &
gupxx_loc * gxxz(i,j,k) + gupyy_loc * gyyz(i,j,k) + gupzz_loc * gzzz(i,j,k) + &
TWO * (gupxy_loc * gxyz(i,j,k) + gupxz_loc * gxzz(i,j,k) + gupyz_loc * gyzz(i,j,k))
enddo
enddo
enddo
!first kind of connection stored in gij,k !first kind of connection stored in gij,k
gxxx = gxx * Gamxxx + gxy * Gamyxx + gxz * Gamzxx gxxx = gxx * Gamxxx + gxy * Gamyxx + gxz * Gamzxx
@@ -655,190 +601,192 @@
Gamxyz * gxzz + Gamyyz * gyzz + Gamzyz * gzzz + & Gamxyz * gxzz + Gamyyz * gyzz + Gamzyz * gzzz + &
Gamxzz * gxzy + Gamyzz * gyzy + Gamzzz * gzzy + & Gamxzz * gxzy + Gamyzz * gyzy + Gamzzz * gzzy + &
Gamxyz * gzzx + Gamyyz * gzzy + Gamzyz * gzzz ) Gamxyz * gzzx + Gamyyz * gzzy + Gamzyz * gzzz )
!covariant second derivative of chi respect to tilted metric !covariant second derivative of chi respect to tilted metric
call fdderivs(ex,chi,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fdderivs(ex,chi,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev)
do k=1,ex(3) fxx = fxx - Gamxxx * chix - Gamyxx * chiy - Gamzxx * chiz
do j=1,ex(2) fxy = fxy - Gamxxy * chix - Gamyxy * chiy - Gamzxy * chiz
do i=1,ex(1) fxz = fxz - Gamxxz * chix - Gamyxz * chiy - Gamzxz * chiz
fxx(i,j,k) = fxx(i,j,k) - Gamxxx(i,j,k) * chix(i,j,k) - Gamyxx(i,j,k) * chiy(i,j,k) - Gamzxx(i,j,k) * chiz(i,j,k) fyy = fyy - Gamxyy * chix - Gamyyy * chiy - Gamzyy * chiz
fxy(i,j,k) = fxy(i,j,k) - Gamxxy(i,j,k) * chix(i,j,k) - Gamyxy(i,j,k) * chiy(i,j,k) - Gamzxy(i,j,k) * chiz(i,j,k) fyz = fyz - Gamxyz * chix - Gamyyz * chiy - Gamzyz * chiz
fxz(i,j,k) = fxz(i,j,k) - Gamxxz(i,j,k) * chix(i,j,k) - Gamyxz(i,j,k) * chiy(i,j,k) - Gamzxz(i,j,k) * chiz(i,j,k) fzz = fzz - Gamxzz * chix - Gamyzz * chiy - Gamzzz * chiz
fyy(i,j,k) = fyy(i,j,k) - Gamxyy(i,j,k) * chix(i,j,k) - Gamyyy(i,j,k) * chiy(i,j,k) - Gamzyy(i,j,k) * chiz(i,j,k) ! Store D^l D_l chi - 3/(2*chi) D^l chi D_l chi in f
fyz(i,j,k) = fyz(i,j,k) - Gamxyz(i,j,k) * chix(i,j,k) - Gamyyz(i,j,k) * chiy(i,j,k) - Gamzyz(i,j,k) * chiz(i,j,k)
fzz(i,j,k) = fzz(i,j,k) - Gamxzz(i,j,k) * chix(i,j,k) - Gamyzz(i,j,k) * chiy(i,j,k) - Gamzzz(i,j,k) * chiz(i,j,k) f = gupxx * ( fxx - F3o2/chin1 * chix * chix ) + &
gupyy * ( fyy - F3o2/chin1 * chiy * chiy ) + &
chin_loc = chin1(i,j,k) gupzz * ( fzz - F3o2/chin1 * chiz * chiz ) + &
f_loc = gupxx(i,j,k) * (fxx(i,j,k) - F3o2/chin_loc * chix(i,j,k) * chix(i,j,k)) + & TWO * gupxy * ( fxy - F3o2/chin1 * chix * chiy ) + &
gupyy(i,j,k) * (fyy(i,j,k) - F3o2/chin_loc * chiy(i,j,k) * chiy(i,j,k)) + & TWO * gupxz * ( fxz - F3o2/chin1 * chix * chiz ) + &
gupzz(i,j,k) * (fzz(i,j,k) - F3o2/chin_loc * chiz(i,j,k) * chiz(i,j,k)) + & TWO * gupyz * ( fyz - F3o2/chin1 * chiy * chiz )
TWO * gupxy(i,j,k) * (fxy(i,j,k) - F3o2/chin_loc * chix(i,j,k) * chiy(i,j,k)) + & ! Add chi part to Ricci tensor:
TWO * gupxz(i,j,k) * (fxz(i,j,k) - F3o2/chin_loc * chix(i,j,k) * chiz(i,j,k)) + &
TWO * gupyz(i,j,k) * (fyz(i,j,k) - F3o2/chin_loc * chiy(i,j,k) * chiz(i,j,k)) Rxx = Rxx + (fxx - chix*chix/chin1/TWO + gxx * f)/chin1/TWO
f(i,j,k) = f_loc Ryy = Ryy + (fyy - chiy*chiy/chin1/TWO + gyy * f)/chin1/TWO
Rzz = Rzz + (fzz - chiz*chiz/chin1/TWO + gzz * f)/chin1/TWO
Rxx(i,j,k) = Rxx(i,j,k) + (fxx(i,j,k) - chix(i,j,k)*chix(i,j,k)/chin_loc/TWO + gxx(i,j,k) * f_loc)/chin_loc/TWO Rxy = Rxy + (fxy - chix*chiy/chin1/TWO + gxy * f)/chin1/TWO
Ryy(i,j,k) = Ryy(i,j,k) + (fyy(i,j,k) - chiy(i,j,k)*chiy(i,j,k)/chin_loc/TWO + gyy(i,j,k) * f_loc)/chin_loc/TWO Rxz = Rxz + (fxz - chix*chiz/chin1/TWO + gxz * f)/chin1/TWO
Rzz(i,j,k) = Rzz(i,j,k) + (fzz(i,j,k) - chiz(i,j,k)*chiz(i,j,k)/chin_loc/TWO + gzz(i,j,k) * f_loc)/chin_loc/TWO Ryz = Ryz + (fyz - chiy*chiz/chin1/TWO + gyz * f)/chin1/TWO
Rxy(i,j,k) = Rxy(i,j,k) + (fxy(i,j,k) - chix(i,j,k)*chiy(i,j,k)/chin_loc/TWO + gxy(i,j,k) * f_loc)/chin_loc/TWO
Rxz(i,j,k) = Rxz(i,j,k) + (fxz(i,j,k) - chix(i,j,k)*chiz(i,j,k)/chin_loc/TWO + gxz(i,j,k) * f_loc)/chin_loc/TWO ! covariant second derivatives of the lapse respect to physical metric
Ryz(i,j,k) = Ryz(i,j,k) + (fyz(i,j,k) - chiy(i,j,k)*chiz(i,j,k)/chin_loc/TWO + gyz(i,j,k) * f_loc)/chin_loc/TWO call fdderivs(ex,Lap,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, &
enddo SYM,SYM,SYM,symmetry,Lev)
enddo
enddo gxxx = (gupxx * chix + gupxy * chiy + gupxz * chiz)/chin1
gxxy = (gupxy * chix + gupyy * chiy + gupyz * chiz)/chin1
! covariant second derivatives of the lapse respect to physical metric gxxz = (gupxz * chix + gupyz * chiy + gupzz * chiz)/chin1
call fdderivs(ex,Lap,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, & ! now get physical second kind of connection
SYM,SYM,SYM,symmetry,Lev) Gamxxx = Gamxxx - ( (chix + chix)/chin1 - gxx * gxxx )*HALF
Gamyxx = Gamyxx - ( - gxx * gxxy )*HALF
do k=1,ex(3) Gamzxx = Gamzxx - ( - gxx * gxxz )*HALF
do j=1,ex(2) Gamxyy = Gamxyy - ( - gyy * gxxx )*HALF
do i=1,ex(1) Gamyyy = Gamyyy - ( (chiy + chiy)/chin1 - gyy * gxxy )*HALF
chin_loc = chin1(i,j,k) Gamzyy = Gamzyy - ( - gyy * gxxz )*HALF
gxxx(i,j,k) = (gupxx(i,j,k) * chix(i,j,k) + gupxy(i,j,k) * chiy(i,j,k) + gupxz(i,j,k) * chiz(i,j,k)) / chin_loc Gamxzz = Gamxzz - ( - gzz * gxxx )*HALF
gxxy(i,j,k) = (gupxy(i,j,k) * chix(i,j,k) + gupyy(i,j,k) * chiy(i,j,k) + gupyz(i,j,k) * chiz(i,j,k)) / chin_loc Gamyzz = Gamyzz - ( - gzz * gxxy )*HALF
gxxz(i,j,k) = (gupxz(i,j,k) * chix(i,j,k) + gupyz(i,j,k) * chiy(i,j,k) + gupzz(i,j,k) * chiz(i,j,k)) / chin_loc Gamzzz = Gamzzz - ( (chiz + chiz)/chin1 - gzz * gxxz )*HALF
Gamxxy = Gamxxy - ( chiy /chin1 - gxy * gxxx )*HALF
Gamxxx(i,j,k) = Gamxxx(i,j,k) - ( (chix(i,j,k) + chix(i,j,k))/chin_loc - gxx(i,j,k) * gxxx(i,j,k) )*HALF Gamyxy = Gamyxy - ( chix /chin1 - gxy * gxxy )*HALF
Gamyxx(i,j,k) = Gamyxx(i,j,k) - ( - gxx(i,j,k) * gxxy(i,j,k) )*HALF Gamzxy = Gamzxy - ( - gxy * gxxz )*HALF
Gamzxx(i,j,k) = Gamzxx(i,j,k) - ( - gxx(i,j,k) * gxxz(i,j,k) )*HALF Gamxxz = Gamxxz - ( chiz /chin1 - gxz * gxxx )*HALF
Gamxyy(i,j,k) = Gamxyy(i,j,k) - ( - gyy(i,j,k) * gxxx(i,j,k) )*HALF Gamyxz = Gamyxz - ( - gxz * gxxy )*HALF
Gamyyy(i,j,k) = Gamyyy(i,j,k) - ( (chiy(i,j,k) + chiy(i,j,k))/chin_loc - gyy(i,j,k) * gxxy(i,j,k) )*HALF Gamzxz = Gamzxz - ( chix /chin1 - gxz * gxxz )*HALF
Gamzyy(i,j,k) = Gamzyy(i,j,k) - ( - gyy(i,j,k) * gxxz(i,j,k) )*HALF Gamxyz = Gamxyz - ( - gyz * gxxx )*HALF
Gamxzz(i,j,k) = Gamxzz(i,j,k) - ( - gzz(i,j,k) * gxxx(i,j,k) )*HALF Gamyyz = Gamyyz - ( chiz /chin1 - gyz * gxxy )*HALF
Gamyzz(i,j,k) = Gamyzz(i,j,k) - ( - gzz(i,j,k) * gxxy(i,j,k) )*HALF Gamzyz = Gamzyz - ( chiy /chin1 - gyz * gxxz )*HALF
Gamzzz(i,j,k) = Gamzzz(i,j,k) - ( (chiz(i,j,k) + chiz(i,j,k))/chin_loc - gzz(i,j,k) * gxxz(i,j,k) )*HALF
Gamxxy(i,j,k) = Gamxxy(i,j,k) - ( chiy(i,j,k) /chin_loc - gxy(i,j,k) * gxxx(i,j,k) )*HALF fxx = fxx - Gamxxx*Lapx - Gamyxx*Lapy - Gamzxx*Lapz
Gamyxy(i,j,k) = Gamyxy(i,j,k) - ( chix(i,j,k) /chin_loc - gxy(i,j,k) * gxxy(i,j,k) )*HALF fyy = fyy - Gamxyy*Lapx - Gamyyy*Lapy - Gamzyy*Lapz
Gamzxy(i,j,k) = Gamzxy(i,j,k) - ( - gxy(i,j,k) * gxxz(i,j,k) )*HALF fzz = fzz - Gamxzz*Lapx - Gamyzz*Lapy - Gamzzz*Lapz
Gamxxz(i,j,k) = Gamxxz(i,j,k) - ( chiz(i,j,k) /chin_loc - gxz(i,j,k) * gxxx(i,j,k) )*HALF fxy = fxy - Gamxxy*Lapx - Gamyxy*Lapy - Gamzxy*Lapz
Gamyxz(i,j,k) = Gamyxz(i,j,k) - ( - gxz(i,j,k) * gxxy(i,j,k) )*HALF fxz = fxz - Gamxxz*Lapx - Gamyxz*Lapy - Gamzxz*Lapz
Gamzxz(i,j,k) = Gamzxz(i,j,k) - ( chix(i,j,k) /chin_loc - gxz(i,j,k) * gxxz(i,j,k) )*HALF fyz = fyz - Gamxyz*Lapx - Gamyyz*Lapy - Gamzyz*Lapz
Gamxyz(i,j,k) = Gamxyz(i,j,k) - ( - gyz(i,j,k) * gxxx(i,j,k) )*HALF
Gamyyz(i,j,k) = Gamyyz(i,j,k) - ( chiz(i,j,k) /chin_loc - gyz(i,j,k) * gxxy(i,j,k) )*HALF ! store D^i D_i Lap in trK_rhs upto chi
Gamzyz(i,j,k) = Gamzyz(i,j,k) - ( chiy(i,j,k) /chin_loc - gyz(i,j,k) * gxxz(i,j,k) )*HALF trK_rhs = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
TWO* ( gupxy * fxy + gupxz * fxz + gupyz * fyz )
fxx(i,j,k) = fxx(i,j,k) - Gamxxx(i,j,k)*Lapx(i,j,k) - Gamyxx(i,j,k)*Lapy(i,j,k) - Gamzxx(i,j,k)*Lapz(i,j,k) #if 1
fyy(i,j,k) = fyy(i,j,k) - Gamxyy(i,j,k)*Lapx(i,j,k) - Gamyyy(i,j,k)*Lapy(i,j,k) - Gamzyy(i,j,k)*Lapz(i,j,k) !! follow bam code
fzz(i,j,k) = fzz(i,j,k) - Gamxzz(i,j,k)*Lapx(i,j,k) - Gamyzz(i,j,k)*Lapy(i,j,k) - Gamzzz(i,j,k)*Lapz(i,j,k) S = chin1 * ( gupxx * Sxx + gupyy * Syy + gupzz * Szz + &
fxy(i,j,k) = fxy(i,j,k) - Gamxxy(i,j,k)*Lapx(i,j,k) - Gamyxy(i,j,k)*Lapy(i,j,k) - Gamzxy(i,j,k)*Lapz(i,j,k) TWO * ( gupxy * Sxy + gupxz * Sxz + gupyz * Syz ) )
fxz(i,j,k) = fxz(i,j,k) - Gamxxz(i,j,k)*Lapx(i,j,k) - Gamyxz(i,j,k)*Lapy(i,j,k) - Gamzxz(i,j,k)*Lapz(i,j,k) f = F2o3 * trK * trK -(&
fyz(i,j,k) = fyz(i,j,k) - Gamxyz(i,j,k)*Lapx(i,j,k) - Gamyyz(i,j,k)*Lapy(i,j,k) - Gamzyz(i,j,k)*Lapz(i,j,k) gupxx * ( &
gupxx * Axx * Axx + gupyy * Axy * Axy + gupzz * Axz * Axz + &
trK_rhs(i,j,k) = gupxx(i,j,k) * fxx(i,j,k) + gupyy(i,j,k) * fyy(i,j,k) + gupzz(i,j,k) * fzz(i,j,k) + & TWO * (gupxy * Axx * Axy + gupxz * Axx * Axz + gupyz * Axy * Axz) ) + &
TWO * (gupxy(i,j,k) * fxy(i,j,k) + gupxz(i,j,k) * fxz(i,j,k) + gupyz(i,j,k) * fyz(i,j,k)) gupyy * ( &
enddo gupxx * Axy * Axy + gupyy * Ayy * Ayy + gupzz * Ayz * Ayz + &
enddo TWO * (gupxy * Axy * Ayy + gupxz * Axy * Ayz + gupyz * Ayy * Ayz) ) + &
enddo gupzz * ( &
do k=1,ex(3) gupxx * Axz * Axz + gupyy * Ayz * Ayz + gupzz * Azz * Azz + &
do j=1,ex(2) TWO * (gupxy * Axz * Ayz + gupxz * Axz * Azz + gupyz * Ayz * Azz) ) + &
do i=1,ex(1) TWO * ( &
divb_loc = div_beta(i,j,k) gupxy * ( &
chin_loc = chin1(i,j,k) gupxx * Axx * Axy + gupyy * Axy * Ayy + gupzz * Axz * Ayz + &
gupxy * (Axx * Ayy + Axy * Axy) + &
S_loc = chin_loc * ( gupxx(i,j,k) * Sxx(i,j,k) + gupyy(i,j,k) * Syy(i,j,k) + gupzz(i,j,k) * Szz(i,j,k) + & gupxz * (Axx * Ayz + Axz * Axy) + &
TWO * (gupxy(i,j,k) * Sxy(i,j,k) + gupxz(i,j,k) * Sxz(i,j,k) + gupyz(i,j,k) * Syz(i,j,k)) ) gupyz * (Axy * Ayz + Axz * Ayy) ) + &
S(i,j,k) = S_loc gupxz * ( &
gupxx * Axx * Axz + gupyy * Axy * Ayz + gupzz * Axz * Azz + &
f_loc = F2o3 * trK(i,j,k) * trK(i,j,k) - ( & gupxy * (Axx * Ayz + Axy * Axz) + &
gupxx(i,j,k) * ( gupxx(i,j,k) * Axx(i,j,k) * Axx(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + & gupxz * (Axx * Azz + Axz * Axz) + &
gupzz(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + & gupyz * (Axy * Azz + Axz * Ayz) ) + &
TWO * (gupxy(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + gupxz(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + & gupyz * ( &
gupyz(i,j,k) * Axy(i,j,k) * Axz(i,j,k)) ) + & gupxx * Axy * Axz + gupyy * Ayy * Ayz + gupzz * Ayz * Azz + &
gupyy(i,j,k) * ( gupxx(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayy(i,j,k) + & gupxy * (Axy * Ayz + Ayy * Axz) + &
gupzz(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + & gupxz * (Axy * Azz + Ayz * Axz) + &
TWO * (gupxy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + gupxz(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + & gupyz * (Ayy * Azz + Ayz * Ayz) ) )) -1.6d1*PI*rho + EIGHT * PI * S
gupyz(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k)) ) + & f = - F1o3 *( gupxx * fxx + gupyy * fyy + gupzz * fzz + &
gupzz(i,j,k) * ( gupxx(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + & TWO* ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) + alpn1/chin1*f)
gupzz(i,j,k) * Azz(i,j,k) * Azz(i,j,k) + &
TWO * (gupxy(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + gupxz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + & fxx = alpn1 * (Rxx - EIGHT * PI * Sxx) - fxx
gupyz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k)) ) + & fxy = alpn1 * (Rxy - EIGHT * PI * Sxy) - fxy
TWO * ( gupxy(i,j,k) * ( gupxx(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + & fxz = alpn1 * (Rxz - EIGHT * PI * Sxz) - fxz
gupzz(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + & fyy = alpn1 * (Ryy - EIGHT * PI * Syy) - fyy
gupxy(i,j,k) * (Axx(i,j,k) * Ayy(i,j,k) + Axy(i,j,k) * Axy(i,j,k)) + & fyz = alpn1 * (Ryz - EIGHT * PI * Syz) - fyz
gupxz(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Axy(i,j,k)) + & fzz = alpn1 * (Rzz - EIGHT * PI * Szz) - fzz
gupyz(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Ayy(i,j,k)) ) + & #else
gupxz(i,j,k) * ( gupxx(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + & ! Add lapse and S_ij parts to Ricci tensor:
gupzz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + &
gupxy(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axy(i,j,k) * Axz(i,j,k)) + & fxx = alpn1 * (Rxx - EIGHT * PI * Sxx) - fxx
gupxz(i,j,k) * (Axx(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Axz(i,j,k)) + & fxy = alpn1 * (Rxy - EIGHT * PI * Sxy) - fxy
gupyz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Ayz(i,j,k)) ) + & fxz = alpn1 * (Rxz - EIGHT * PI * Sxz) - fxz
gupyz(i,j,k) * ( gupxx(i,j,k) * Axy(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k) + & fyy = alpn1 * (Ryy - EIGHT * PI * Syy) - fyy
gupzz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k) + & fyz = alpn1 * (Ryz - EIGHT * PI * Syz) - fyz
gupxy(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Ayy(i,j,k) * Axz(i,j,k)) + & fzz = alpn1 * (Rzz - EIGHT * PI * Szz) - fzz
gupxz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Axz(i,j,k)) + &
gupyz(i,j,k) * (Ayy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Ayz(i,j,k)) ) ) ) - & ! Compute trace-free part (note: chi^-1 and chi cancel!):
F16 * PI * rho(i,j,k) + EIGHT * PI * S_loc
f = F1o3 *( gupxx * fxx + gupyy * fyy + gupzz * fzz + &
f_loc = -F1o3 * ( gupxx(i,j,k) * fxx(i,j,k) + gupyy(i,j,k) * fyy(i,j,k) + gupzz(i,j,k) * fzz(i,j,k) + & TWO* ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) )
TWO * (gupxy(i,j,k) * fxy(i,j,k) + gupxz(i,j,k) * fxz(i,j,k) + gupyz(i,j,k) * fyz(i,j,k)) + & #endif
alpn1(i,j,k)/chin_loc * f_loc )
f(i,j,k) = f_loc Axx_rhs = fxx - gxx * f
Ayy_rhs = fyy - gyy * f
l_fxx = alpn1(i,j,k) * (Rxx(i,j,k) - EIGHT * PI * Sxx(i,j,k)) - fxx(i,j,k) Azz_rhs = fzz - gzz * f
l_fxy = alpn1(i,j,k) * (Rxy(i,j,k) - EIGHT * PI * Sxy(i,j,k)) - fxy(i,j,k) Axy_rhs = fxy - gxy * f
l_fxz = alpn1(i,j,k) * (Rxz(i,j,k) - EIGHT * PI * Sxz(i,j,k)) - fxz(i,j,k) Axz_rhs = fxz - gxz * f
l_fyy = alpn1(i,j,k) * (Ryy(i,j,k) - EIGHT * PI * Syy(i,j,k)) - fyy(i,j,k) Ayz_rhs = fyz - gyz * f
l_fyz = alpn1(i,j,k) * (Ryz(i,j,k) - EIGHT * PI * Syz(i,j,k)) - fyz(i,j,k)
l_fzz = alpn1(i,j,k) * (Rzz(i,j,k) - EIGHT * PI * Szz(i,j,k)) - fzz(i,j,k) ! Now: store A_il A^l_j into fij:
Axx_rhs(i,j,k) = l_fxx - gxx(i,j,k) * f_loc fxx = gupxx * Axx * Axx + gupyy * Axy * Axy + gupzz * Axz * Axz + &
Ayy_rhs(i,j,k) = l_fyy - gyy(i,j,k) * f_loc TWO * (gupxy * Axx * Axy + gupxz * Axx * Axz + gupyz * Axy * Axz)
Azz_rhs(i,j,k) = l_fzz - gzz(i,j,k) * f_loc fyy = gupxx * Axy * Axy + gupyy * Ayy * Ayy + gupzz * Ayz * Ayz + &
Axy_rhs(i,j,k) = l_fxy - gxy(i,j,k) * f_loc TWO * (gupxy * Axy * Ayy + gupxz * Axy * Ayz + gupyz * Ayy * Ayz)
Axz_rhs(i,j,k) = l_fxz - gxz(i,j,k) * f_loc fzz = gupxx * Axz * Axz + gupyy * Ayz * Ayz + gupzz * Azz * Azz + &
Ayz_rhs(i,j,k) = l_fyz - gyz(i,j,k) * f_loc TWO * (gupxy * Axz * Ayz + gupxz * Axz * Azz + gupyz * Ayz * Azz)
fxy = gupxx * Axx * Axy + gupyy * Axy * Ayy + gupzz * Axz * Ayz + &
fxx(i,j,k) = gupxx(i,j,k) * Axx(i,j,k) * Axx(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + & gupxy *(Axx * Ayy + Axy * Axy) + &
gupzz(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + TWO * (gupxy(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + & gupxz *(Axx * Ayz + Axz * Axy) + &
gupxz(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + gupyz(i,j,k) * Axy(i,j,k) * Axz(i,j,k)) gupyz *(Axy * Ayz + Axz * Ayy)
fyy(i,j,k) = gupxx(i,j,k) * Axy(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayy(i,j,k) + & fxz = gupxx * Axx * Axz + gupyy * Axy * Ayz + gupzz * Axz * Azz + &
gupzz(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + TWO * (gupxy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + & gupxy *(Axx * Ayz + Axy * Axz) + &
gupxz(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + gupyz(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k)) gupxz *(Axx * Azz + Axz * Axz) + &
fzz(i,j,k) = gupxx(i,j,k) * Axz(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayz(i,j,k) * Ayz(i,j,k) + & gupyz *(Axy * Azz + Axz * Ayz)
gupzz(i,j,k) * Azz(i,j,k) * Azz(i,j,k) + TWO * (gupxy(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + & fyz = gupxx * Axy * Axz + gupyy * Ayy * Ayz + gupzz * Ayz * Azz + &
gupxz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + gupyz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k)) gupxy *(Axy * Ayz + Ayy * Axz) + &
fxy(i,j,k) = gupxx(i,j,k) * Axx(i,j,k) * Axy(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayy(i,j,k) + & gupxz *(Axy * Azz + Ayz * Axz) + &
gupzz(i,j,k) * Axz(i,j,k) * Ayz(i,j,k) + gupxy(i,j,k) * (Axx(i,j,k) * Ayy(i,j,k) + Axy(i,j,k) * Axy(i,j,k)) + & gupyz *(Ayy * Azz + Ayz * Ayz)
gupxz(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Axy(i,j,k)) + &
gupyz(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Axz(i,j,k) * Ayy(i,j,k)) f = chin1
fxz(i,j,k) = gupxx(i,j,k) * Axx(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Axy(i,j,k) * Ayz(i,j,k) + & ! store D^i D_i Lap in trK_rhs
gupzz(i,j,k) * Axz(i,j,k) * Azz(i,j,k) + gupxy(i,j,k) * (Axx(i,j,k) * Ayz(i,j,k) + Axy(i,j,k) * Axz(i,j,k)) + & trK_rhs = f*trK_rhs
gupxz(i,j,k) * (Axx(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Axz(i,j,k)) + &
gupyz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Axz(i,j,k) * Ayz(i,j,k)) Axx_rhs = f * Axx_rhs+ alpn1 * (trK * Axx - TWO * fxx) + &
fyz(i,j,k) = gupxx(i,j,k) * Axy(i,j,k) * Axz(i,j,k) + gupyy(i,j,k) * Ayy(i,j,k) * Ayz(i,j,k) + & TWO * ( Axx * betaxx + Axy * betayx + Axz * betazx )- &
gupzz(i,j,k) * Ayz(i,j,k) * Azz(i,j,k) + gupxy(i,j,k) * (Axy(i,j,k) * Ayz(i,j,k) + Ayy(i,j,k) * Axz(i,j,k)) + & F2o3 * Axx * div_beta
gupxz(i,j,k) * (Axy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Axz(i,j,k)) + &
gupyz(i,j,k) * (Ayy(i,j,k) * Azz(i,j,k) + Ayz(i,j,k) * Ayz(i,j,k)) Ayy_rhs = f * Ayy_rhs+ alpn1 * (trK * Ayy - TWO * fyy) + &
TWO * ( Axy * betaxy + Ayy * betayy + Ayz * betazy )- &
trK_rhs(i,j,k) = chin_loc * trK_rhs(i,j,k) F2o3 * Ayy * div_beta
Axx_rhs(i,j,k) = chin_loc * Axx_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Axx(i,j,k) - TWO * fxx(i,j,k)) + & Azz_rhs = f * Azz_rhs+ alpn1 * (trK * Azz - TWO * fzz) + &
TWO * (Axx(i,j,k) * betaxx(i,j,k) + Axy(i,j,k) * betayx(i,j,k) + Axz(i,j,k) * betazx(i,j,k)) - & TWO * ( Axz * betaxz + Ayz * betayz + Azz * betazz )- &
F2o3 * Axx(i,j,k) * divb_loc F2o3 * Azz * div_beta
Ayy_rhs(i,j,k) = chin_loc * Ayy_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Ayy(i,j,k) - TWO * fyy(i,j,k)) + &
TWO * (Axy(i,j,k) * betaxy(i,j,k) + Ayy(i,j,k) * betayy(i,j,k) + Ayz(i,j,k) * betazy(i,j,k)) - & Axy_rhs = f * Axy_rhs+ alpn1 *( trK * Axy - TWO * fxy )+ &
F2o3 * Ayy(i,j,k) * divb_loc Axx * betaxy + Axz * betazy + &
Azz_rhs(i,j,k) = chin_loc * Azz_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Azz(i,j,k) - TWO * fzz(i,j,k)) + & Ayy * betayx + Ayz * betazx + &
TWO * (Axz(i,j,k) * betaxz(i,j,k) + Ayz(i,j,k) * betayz(i,j,k) + Azz(i,j,k) * betazz(i,j,k)) - & F1o3 * Axy * div_beta - Axy * betazz
F2o3 * Azz(i,j,k) * divb_loc
Axy_rhs(i,j,k) = chin_loc * Axy_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Axy(i,j,k) - TWO * fxy(i,j,k)) + & Ayz_rhs = f * Ayz_rhs+ alpn1 *( trK * Ayz - TWO * fyz )+ &
Axx(i,j,k) * betaxy(i,j,k) + Axz(i,j,k) * betazy(i,j,k) + Ayy(i,j,k) * betayx(i,j,k) + & Axy * betaxz + Ayy * betayz + &
Ayz(i,j,k) * betazx(i,j,k) + F1o3 * Axy(i,j,k) * divb_loc - Axy(i,j,k) * betazz(i,j,k) Axz * betaxy + Azz * betazy + &
Ayz_rhs(i,j,k) = chin_loc * Ayz_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Ayz(i,j,k) - TWO * fyz(i,j,k)) + & F1o3 * Ayz * div_beta - Ayz * betaxx
Axy(i,j,k) * betaxz(i,j,k) + Ayy(i,j,k) * betayz(i,j,k) + Axz(i,j,k) * betaxy(i,j,k) + &
Azz(i,j,k) * betazy(i,j,k) + F1o3 * Ayz(i,j,k) * divb_loc - Ayz(i,j,k) * betaxx(i,j,k) Axz_rhs = f * Axz_rhs+ alpn1 *( trK * Axz - TWO * fxz )+ &
Axz_rhs(i,j,k) = chin_loc * Axz_rhs(i,j,k) + alpn1(i,j,k) * (trK(i,j,k) * Axz(i,j,k) - TWO * fxz(i,j,k)) + & Axx * betaxz + Axy * betayz + &
Axx(i,j,k) * betaxz(i,j,k) + Axy(i,j,k) * betayz(i,j,k) + Ayz(i,j,k) * betayx(i,j,k) + & Ayz * betayx + Azz * betazx + &
Azz(i,j,k) * betazx(i,j,k) + F1o3 * Axz(i,j,k) * divb_loc - Axz(i,j,k) * betayy(i,j,k) F1o3 * Axz * div_beta - Axz * betayy !rhs for Aij
trK_rhs(i,j,k) = - trK_rhs(i,j,k) + alpn1(i,j,k) * ( F1o3 * trK(i,j,k) * trK(i,j,k) + & ! Compute trace of S_ij
gupxx(i,j,k) * fxx(i,j,k) + gupyy(i,j,k) * fyy(i,j,k) + gupzz(i,j,k) * fzz(i,j,k) + &
TWO * (gupxy(i,j,k) * fxy(i,j,k) + gupxz(i,j,k) * fxz(i,j,k) + gupyz(i,j,k) * fyz(i,j,k)) + & S = f * ( gupxx * Sxx + gupyy * Syy + gupzz * Szz + &
FOUR * PI * (rho(i,j,k) + S_loc) ) TWO * ( gupxy * Sxy + gupxz * Sxz + gupyz * Syz ) )
enddo
enddo trK_rhs = - trK_rhs + alpn1 *( F1o3 * trK * trK + &
enddo gupxx * fxx + gupyy * fyy + gupzz * fzz + &
TWO * ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) + &
FOUR * PI * ( rho + S )) !rhs for trK
!!!! gauge variable part !!!! gauge variable part
@@ -1000,15 +948,15 @@
!!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency) !!!!!!!!!advection term + Kreiss-Oliger dissipation (merged for cache efficiency)
! lopsided_kodis shares the symmetry_bd buffer between advection and ! lopsided_kodis shares the symmetry_bd buffer between advection and
! dissipation, eliminating redundant full-grid copies. For metric variables ! dissipation, eliminating redundant full-grid copies. For metric variables
! gxx/gyy/gzz (=dxx/dyy/dzz+1): stencil coefficients sum to zero, ! gxx/gyy/gzz (=dxx/dyy/dzz+1): kodis stencil coefficients sum to zero,
! so the constant offset has no effect on dissipation. ! so the constant offset has no effect on dissipation.
call lopsided_kodis(ex,X,Y,Z,dxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps) call lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps)
call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps) call lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps)
call lopsided_kodis(ex,X,Y,Z,dyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps) call lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps)
call lopsided_kodis(ex,X,Y,Z,dzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps) call lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps)
call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps) call lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps)

View File

@@ -39,6 +39,7 @@ int f_compute_rhs_bssn(int *ex, double &T,
// printf("nx=%d ny=%d nz=%d all=%d\n", nx, ny, nz, all); // printf("nx=%d ny=%d nz=%d all=%d\n", nx, ny, nz, all);
// temp variable // temp variable
double gxx[all],gyy[all],gzz[all];
double chix[all],chiy[all],chiz[all]; double chix[all],chiy[all],chiz[all];
double gxxx[all],gxyx[all],gxzx[all],gyyx[all],gyzx[all],gzzx[all]; double gxxx[all],gxyx[all],gxzx[all],gyyx[all],gyzx[all],gzzx[all];
double gxxy[all],gxyy[all],gxzy[all],gyyy[all],gyzy[all],gzzy[all]; double gxxy[all],gxyy[all],gxzy[all],gyyy[all],gyzy[all],gzzy[all];
@@ -50,9 +51,9 @@ int f_compute_rhs_bssn(int *ex, double &T,
double Gamxx[all],Gamxy[all],Gamxz[all]; double Gamxx[all],Gamxy[all],Gamxz[all];
double Gamyx[all],Gamyy[all],Gamyz[all]; double Gamyx[all],Gamyy[all],Gamyz[all];
double Gamzx[all],Gamzy[all],Gamzz[all]; double Gamzx[all],Gamzy[all],Gamzz[all];
double Kx[all], Ky[all], Kz[all], S[all]; double Kx[all], Ky[all], Kz[all], div_beta[all], S[all];
double f[all], fxx[all], fxy[all], fxz[all], fyy[all], fyz[all], fzz[all]; double f[all], fxx[all], fxy[all], fxz[all], fyy[all], fyz[all], fzz[all];
double alpn1[all], chin1[all]; double Gamxa[all], Gamya[all], Gamza[all], alpn1[all], chin1[all];
double gupxx[all], gupxy[all], gupxz[all]; double gupxx[all], gupxy[all], gupxz[all];
double gupyy[all], gupyz[all], gupzz[all]; double gupyy[all], gupyz[all], gupzz[all];
double SSS[3] = { 1.0, 1.0, 1.0}; double SSS[3] = { 1.0, 1.0, 1.0};
@@ -106,6 +107,9 @@ int f_compute_rhs_bssn(int *ex, double &T,
for(int i=0;i<all;i+=1){ for(int i=0;i<all;i+=1){
alpn1[i] = Lap[i] + 1.0; alpn1[i] = Lap[i] + 1.0;
chin1[i] = chi[i] + 1.0; chin1[i] = chi[i] + 1.0;
gxx[i] = dxx[i] + 1.0;
gyy[i] = dyy[i] + 1.0;
gzz[i] = dzz[i] + 1.0;
} }
// 9ms // // 9ms //
fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev); fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev);
@@ -123,196 +127,231 @@ int f_compute_rhs_bssn(int *ex, double &T,
// 3ms // // 3ms //
for(int i=0;i<all;i+=1){ for(int i=0;i<all;i+=1){
const double divb = betaxx[i] + betayy[i] + betazz[i]; div_beta[i] = betaxx[i] + betayy[i] + betazz[i];
chi_rhs[i] = F2o3 * chin1[i] * (alpn1[i] * trK[i] - divb); chi_rhs[i] = F2o3 * chin1[i] * (alpn1[i] * trK[i] - div_beta[i]);
gxx_rhs[i] = -TWO * alpn1[i] * Axx[i] - F2o3 * (dxx[i] + ONE) * divb + gxx_rhs[i] = -TWO * alpn1[i] * Axx[i] - F2o3 * gxx[i] * div_beta[i] +
TWO * ((dxx[i] + ONE) * betaxx[i] + gxy[i] * betayx[i] + gxz[i] * betazx[i]); TWO * (gxx[i] * betaxx[i] + gxy[i] * betayx[i] + gxz[i] * betazx[i]);
gyy_rhs[i] = -TWO * alpn1[i] * Ayy[i] - F2o3 * (dyy[i] + ONE) * divb + gyy_rhs[i] = -TWO * alpn1[i] * Ayy[i] - F2o3 * gyy[i] * div_beta[i] +
TWO * (gxy[i] * betaxy[i] + (dyy[i] + ONE) * betayy[i] + gyz[i] * betazy[i]); TWO * (gxy[i] * betaxy[i] + gyy[i] * betayy[i] + gyz[i] * betazy[i]);
gzz_rhs[i] = -TWO * alpn1[i] * Azz[i] - F2o3 * (dzz[i] + ONE) * divb + gzz_rhs[i] = -TWO * alpn1[i] * Azz[i] - F2o3 * gzz[i] * div_beta[i] +
TWO * (gxz[i] * betaxz[i] + gyz[i] * betayz[i] + (dzz[i] + ONE) * betazz[i]); TWO * (gxz[i] * betaxz[i] + gyz[i] * betayz[i] + gzz[i] * betazz[i]);
gxy_rhs[i] = -TWO * alpn1[i] * Axy[i] + F1o3 * gxy[i] * divb + gxy_rhs[i] = -TWO * alpn1[i] * Axy[i] + F1o3 * gxy[i] * div_beta[i] +
(dxx[i] + ONE) * betaxy[i] + gxz[i] * betazy[i] + (dyy[i] + ONE) * betayx[i] gxx[i] * betaxy[i] + gxz[i] * betazy[i] + gyy[i] * betayx[i]
+ gyz[i] * betazx[i] - gxy[i] * betazz[i]; + gyz[i] * betazx[i] - gxy[i] * betazz[i];
gyz_rhs[i] = -TWO * alpn1[i] * Ayz[i] + F1o3 * gyz[i] * divb + gyz_rhs[i] = -TWO * alpn1[i] * Ayz[i] + F1o3 * gyz[i] * div_beta[i] +
gxy[i] * betaxz[i] + (dyy[i] + ONE) * betayz[i] + gxz[i] * betaxy[i] gxy[i] * betaxz[i] + gyy[i] * betayz[i] + gxz[i] * betaxy[i]
+ (dzz[i] + ONE) * betazy[i] - gyz[i] * betaxx[i]; + gzz[i] * betazy[i] - gyz[i] * betaxx[i];
gxz_rhs[i] = -TWO * alpn1[i] * Axz[i] + F1o3 * gxz[i] * divb + gxz_rhs[i] = -TWO * alpn1[i] * Axz[i] + F1o3 * gxz[i] * div_beta[i] +
(dxx[i] + ONE) * betaxz[i] + gxy[i] * betayz[i] + gyz[i] * betayx[i] gxx[i] * betaxz[i] + gxy[i] * betayz[i] + gyz[i] * betayx[i]
+ (dzz[i] + ONE) * betazx[i] - gxz[i] * betayy[i]; + gzz[i] * betazx[i] - gxz[i] * betayy[i];
} }
// Fused: inverse metric + Gamma constraint + Christoffel (3 loops -> 1) // 1ms //
for(int i=0;i<all;i+=1){ for(int i=0;i<all;i+=1){
double det = (dxx[i] + ONE) * (dyy[i] + ONE) * (dzz[i] + ONE) + gxy[i] * gyz[i] * gxz[i] + gxz[i] * gxy[i] * gyz[i] - double det = gxx[i] * gyy[i] * gzz[i] + gxy[i] * gyz[i] * gxz[i] + gxz[i] * gxy[i] * gyz[i] -
gxz[i] * (dyy[i] + ONE) * gxz[i] - gxy[i] * gxy[i] * (dzz[i] + ONE) - (dxx[i] + ONE) * gyz[i] * gyz[i]; gxz[i] * gyy[i] * gxz[i] - gxy[i] * gxy[i] * gzz[i] - gxx[i] * gyz[i] * gyz[i];
double lg_xx = ((dyy[i] + ONE) * (dzz[i] + ONE) - gyz[i] * gyz[i]) / det; gupxx[i] = (gyy[i] * gzz[i] - gyz[i] * gyz[i]) / det;
double lg_xy = -(gxy[i] * (dzz[i] + ONE) - gyz[i] * gxz[i]) / det; gupxy[i] = -(gxy[i] * gzz[i] - gyz[i] * gxz[i]) / det;
double lg_xz = (gxy[i] * gyz[i] - (dyy[i] + ONE) * gxz[i]) / det; gupxz[i] = (gxy[i] * gyz[i] - gyy[i] * gxz[i]) / det;
double lg_yy = ((dxx[i] + ONE) * (dzz[i] + ONE) - gxz[i] * gxz[i]) / det; gupyy[i] = (gxx[i] * gzz[i] - gxz[i] * gxz[i]) / det;
double lg_yz = -((dxx[i] + ONE) * gyz[i] - gxy[i] * gxz[i]) / det; gupyz[i] = -(gxx[i] * gyz[i] - gxy[i] * gxz[i]) / det;
double lg_zz = ((dxx[i] + ONE) * (dyy[i] + ONE) - gxy[i] * gxy[i]) / det; gupzz[i] = (gxx[i] * gyy[i] - gxy[i] * gxy[i]) / det;
gupxx[i] = lg_xx; gupxy[i] = lg_xy; gupxz[i] = lg_xz; }
gupyy[i] = lg_yy; gupyz[i] = lg_yz; gupzz[i] = lg_zz; // 2.2ms //
if(co==0){
if(co==0){ for (int i=0;i<all;i+=1) {
Gmx_Res[i] = Gamx[i] - ( Gmx_Res[i] = Gamx[i] - (
lg_xx * (lg_xx*gxxx[i] + lg_xy*gxyx[i] + lg_xz*gxzx[i]) + gupxx[i] * (gupxx[i]*gxxx[i] + gupxy[i]*gxyx[i] + gupxz[i]*gxzx[i]) +
lg_xy * (lg_xx*gxyx[i] + lg_xy*gyyx[i] + lg_xz*gyzx[i]) + gupxy[i] * (gupxx[i]*gxyx[i] + gupxy[i]*gyyx[i] + gupxz[i]*gyzx[i]) +
lg_xz * (lg_xx*gxzx[i] + lg_xy*gyzx[i] + lg_xz*gzzx[i]) + gupxz[i] * (gupxx[i]*gxzx[i] + gupxy[i]*gyzx[i] + gupxz[i]*gzzx[i]) +
lg_xx * (lg_xy*gxxy[i] + lg_yy*gxyy[i] + lg_yz*gxzy[i]) +
lg_xy * (lg_xy*gxyy[i] + lg_yy*gyyy[i] + lg_yz*gyzy[i]) + gupxx[i] * (gupxy[i]*gxxy[i] + gupyy[i]*gxyy[i] + gupyz[i]*gxzy[i]) +
lg_xz * (lg_xy*gxzy[i] + lg_yy*gyzy[i] + lg_yz*gzzy[i]) + gupxy[i] * (gupxy[i]*gxyy[i] + gupyy[i]*gyyy[i] + gupyz[i]*gyzy[i]) +
lg_xx * (lg_xz*gxxz[i] + lg_yz*gxyz[i] + lg_zz*gxzz[i]) + gupxz[i] * (gupxy[i]*gxzy[i] + gupyy[i]*gyzy[i] + gupyz[i]*gzzy[i]) +
lg_xy * (lg_xz*gxyz[i] + lg_yz*gyyz[i] + lg_zz*gyzz[i]) +
lg_xz * (lg_xz*gxzz[i] + lg_yz*gyzz[i] + lg_zz*gzzz[i]) gupxx[i] * (gupxz[i]*gxxz[i] + gupyz[i]*gxyz[i] + gupzz[i]*gxzz[i]) +
gupxy[i] * (gupxz[i]*gxyz[i] + gupyz[i]*gyyz[i] + gupzz[i]*gyzz[i]) +
gupxz[i] * (gupxz[i]*gxzz[i] + gupyz[i]*gyzz[i] + gupzz[i]*gzzz[i])
); );
Gmy_Res[i] = Gamy[i] - ( Gmy_Res[i] = Gamy[i] - (
lg_xx * (lg_xy*gxxx[i] + lg_yy*gxyx[i] + lg_yz*gxzx[i]) + gupxx[i] * (gupxy[i]*gxxx[i] + gupyy[i]*gxyx[i] + gupyz[i]*gxzx[i]) +
lg_xy * (lg_xy*gxyx[i] + lg_yy*gyyx[i] + lg_yz*gyzx[i]) + gupxy[i] * (gupxy[i]*gxyx[i] + gupyy[i]*gyyx[i] + gupyz[i]*gyzx[i]) +
lg_xz * (lg_xy*gxzx[i] + lg_yy*gyzx[i] + lg_yz*gzzx[i]) + gupxz[i] * (gupxy[i]*gxzx[i] + gupyy[i]*gyzx[i] + gupyz[i]*gzzx[i]) +
lg_xy * (lg_xy*gxxy[i] + lg_yy*gxyy[i] + lg_yz*gxzy[i]) +
lg_yy * (lg_xy*gxyy[i] + lg_yy*gyyy[i] + lg_yz*gyzy[i]) + gupxy[i] * (gupxy[i]*gxxy[i] + gupyy[i]*gxyy[i] + gupyz[i]*gxzy[i]) +
lg_yz * (lg_xy*gxzy[i] + lg_yy*gyzy[i] + lg_yz*gzzy[i]) + gupyy[i] * (gupxy[i]*gxyy[i] + gupyy[i]*gyyy[i] + gupyz[i]*gyzy[i]) +
lg_xy * (lg_xz*gxxz[i] + lg_yz*gxyz[i] + lg_zz*gxzz[i]) + gupyz[i] * (gupxy[i]*gxzy[i] + gupyy[i]*gyzy[i] + gupyz[i]*gzzy[i]) +
lg_yy * (lg_xz*gxyz[i] + lg_yz*gyyz[i] + lg_zz*gyzz[i]) +
lg_yz * (lg_xz*gxzz[i] + lg_yz*gyzz[i] + lg_zz*gzzz[i]) gupxy[i] * (gupxz[i]*gxxz[i] + gupyz[i]*gxyz[i] + gupzz[i]*gxzz[i]) +
gupyy[i] * (gupxz[i]*gxyz[i] + gupyz[i]*gyyz[i] + gupzz[i]*gyzz[i]) +
gupyz[i] * (gupxz[i]*gxzz[i] + gupyz[i]*gyzz[i] + gupzz[i]*gzzz[i])
); );
Gmz_Res[i] = Gamz[i] - ( Gmz_Res[i] = Gamz[i] - (
lg_xx * (lg_xz*gxxx[i] + lg_yz*gxyx[i] + lg_zz*gxzx[i]) + gupxx[i] * (gupxz[i]*gxxx[i] + gupyz[i]*gxyx[i] + gupzz[i]*gxzx[i]) +
lg_xy * (lg_xz*gxyx[i] + lg_yz*gyyx[i] + lg_zz*gyzx[i]) + gupxy[i] * (gupxz[i]*gxyx[i] + gupyz[i]*gyyx[i] + gupzz[i]*gyzx[i]) +
lg_xz * (lg_xz*gxzx[i] + lg_yz*gyzx[i] + lg_zz*gzzx[i]) + gupxz[i] * (gupxz[i]*gxzx[i] + gupyz[i]*gyzx[i] + gupzz[i]*gzzx[i]) +
lg_xy * (lg_xz*gxxy[i] + lg_yz*gxyy[i] + lg_zz*gxzy[i]) +
lg_yy * (lg_xz*gxyy[i] + lg_yz*gyyy[i] + lg_zz*gyzy[i]) + gupxy[i] * (gupxz[i]*gxxy[i] + gupyz[i]*gxyy[i] + gupzz[i]*gxzy[i]) +
lg_yz * (lg_xz*gxzy[i] + lg_yz*gyzy[i] + lg_zz*gzzy[i]) + gupyy[i] * (gupxz[i]*gxyy[i] + gupyz[i]*gyyy[i] + gupzz[i]*gyzy[i]) +
lg_xz * (lg_xz*gxxz[i] + lg_yz*gxyz[i] + lg_zz*gxzz[i]) + gupyz[i] * (gupxz[i]*gxzy[i] + gupyz[i]*gyzy[i] + gupzz[i]*gzzy[i]) +
lg_yz * (lg_xz*gxyz[i] + lg_yz*gyyz[i] + lg_zz*gyzz[i]) +
lg_zz * (lg_xz*gxzz[i] + lg_yz*gyzz[i] + lg_zz*gzzz[i]) gupxz[i] * (gupxz[i]*gxxz[i] + gupyz[i]*gxyz[i] + gupzz[i]*gxzz[i]) +
gupyz[i] * (gupxz[i]*gxyz[i] + gupyz[i]*gyyz[i] + gupzz[i]*gyzz[i]) +
gupzz[i] * (gupxz[i]*gxzz[i] + gupyz[i]*gyzz[i] + gupzz[i]*gzzz[i])
); );
} }
Gamxxx[i] = HALF * ( lg_xx*gxxx[i]
+ lg_xy*(TWO*gxyx[i] - gxxy[i])
+ lg_xz*(TWO*gxzx[i] - gxxz[i]) );
Gamyxx[i] = HALF * ( lg_xy*gxxx[i]
+ lg_yy*(TWO*gxyx[i] - gxxy[i])
+ lg_yz*(TWO*gxzx[i] - gxxz[i]) );
Gamzxx[i] = HALF * ( lg_xz*gxxx[i]
+ lg_yz*(TWO*gxyx[i] - gxxy[i])
+ lg_zz*(TWO*gxzx[i] - gxxz[i]) );
Gamxyy[i] = HALF * ( lg_xx*(TWO*gxyy[i] - gyyx[i])
+ lg_xy*gyyy[i]
+ lg_xz*(TWO*gyzy[i] - gyyz[i]) );
Gamyyy[i] = HALF * ( lg_xy*(TWO*gxyy[i] - gyyx[i])
+ lg_yy*gyyy[i]
+ lg_yz*(TWO*gyzy[i] - gyyz[i]) );
Gamzyy[i] = HALF * ( lg_xz*(TWO*gxyy[i] - gyyx[i])
+ lg_yz*gyyy[i]
+ lg_zz*(TWO*gyzy[i] - gyyz[i]) );
Gamxzz[i] = HALF * ( lg_xx*(TWO*gxzz[i] - gzzx[i])
+ lg_xy*(TWO*gyzz[i] - gzzy[i])
+ lg_xz*gzzz[i] );
Gamyzz[i] = HALF * ( lg_xy*(TWO*gxzz[i] - gzzx[i])
+ lg_yy*(TWO*gyzz[i] - gzzy[i])
+ lg_yz*gzzz[i] );
Gamzzz[i] = HALF * ( lg_xz*(TWO*gxzz[i] - gzzx[i])
+ lg_yz*(TWO*gyzz[i] - gzzy[i])
+ lg_zz*gzzz[i] );
Gamxxy[i] = HALF * ( lg_xx*gxxy[i]
+ lg_xy*gyyx[i]
+ lg_xz*(gxzy[i] + gyzx[i] - gxyz[i]) );
Gamyxy[i] = HALF * ( lg_xy*gxxy[i]
+ lg_yy*gyyx[i]
+ lg_yz*(gxzy[i] + gyzx[i] - gxyz[i]) );
Gamzxy[i] = HALF * ( lg_xz*gxxy[i]
+ lg_yz*gyyx[i]
+ lg_zz*(gxzy[i] + gyzx[i] - gxyz[i]) );
Gamxxz[i] = HALF * ( lg_xx*gxxz[i]
+ lg_xy*(gxyz[i] + gyzx[i] - gxzy[i])
+ lg_xz*gzzx[i] );
Gamyxz[i] = HALF * ( lg_xy*gxxz[i]
+ lg_yy*(gxyz[i] + gyzx[i] - gxzy[i])
+ lg_yz*gzzx[i] );
Gamzxz[i] = HALF * ( lg_xz*gxxz[i]
+ lg_yz*(gxyz[i] + gyzx[i] - gxzy[i])
+ lg_zz*gzzx[i] );
Gamxyz[i] = HALF * ( lg_xx*(gxyz[i] + gxzy[i] - gyzx[i])
+ lg_xy*gyyz[i]
+ lg_xz*gzzy[i] );
Gamyyz[i] = HALF * ( lg_xy*(gxyz[i] + gxzy[i] - gyzx[i])
+ lg_yy*gyyz[i]
+ lg_yz*gzzy[i] );
Gamzyz[i] = HALF * ( lg_xz*(gxyz[i] + gxzy[i] - gyzx[i])
+ lg_yz*gyyz[i]
+ lg_zz*gzzy[i] );
} }
// Fused: A^{ij} raise-index + Gamma_rhs part 1 (2 loops -> 1) // 5ms //
for (int i=0;i<all;i+=1) { for (int i=0;i<all;i+=1) {
double axx = gupxx[i]*gupxx[i]*Axx[i]
Gamxxx[i] = HALF * ( gupxx[i]*gxxx[i]
+ gupxy[i]*(TWO*gxyx[i] - gxxy[i])
+ gupxz[i]*(TWO*gxzx[i] - gxxz[i]) );
Gamyxx[i] = HALF * ( gupxy[i]*gxxx[i]
+ gupyy[i]*(TWO*gxyx[i] - gxxy[i])
+ gupyz[i]*(TWO*gxzx[i] - gxxz[i]) );
Gamzxx[i] = HALF * ( gupxz[i]*gxxx[i]
+ gupyz[i]*(TWO*gxyx[i] - gxxy[i])
+ gupzz[i]*(TWO*gxzx[i] - gxxz[i]) );
Gamxyy[i] = HALF * ( gupxx[i]*(TWO*gxyy[i] - gyyx[i])
+ gupxy[i]*gyyy[i]
+ gupxz[i]*(TWO*gyzy[i] - gyyz[i]) );
Gamyyy[i] = HALF * ( gupxy[i]*(TWO*gxyy[i] - gyyx[i])
+ gupyy[i]*gyyy[i]
+ gupyz[i]*(TWO*gyzy[i] - gyyz[i]) );
Gamzyy[i] = HALF * ( gupxz[i]*(TWO*gxyy[i] - gyyx[i])
+ gupyz[i]*gyyy[i]
+ gupzz[i]*(TWO*gyzy[i] - gyyz[i]) );
Gamxzz[i] = HALF * ( gupxx[i]*(TWO*gxzz[i] - gzzx[i])
+ gupxy[i]*(TWO*gyzz[i] - gzzy[i])
+ gupxz[i]*gzzz[i] );
Gamyzz[i] = HALF * ( gupxy[i]*(TWO*gxzz[i] - gzzx[i])
+ gupyy[i]*(TWO*gyzz[i] - gzzy[i])
+ gupyz[i]*gzzz[i] );
Gamzzz[i] = HALF * ( gupxz[i]*(TWO*gxzz[i] - gzzx[i])
+ gupyz[i]*(TWO*gyzz[i] - gzzy[i])
+ gupzz[i]*gzzz[i] );
Gamxxy[i] = HALF * ( gupxx[i]*gxxy[i]
+ gupxy[i]*gyyx[i]
+ gupxz[i]*(gxzy[i] + gyzx[i] - gxyz[i]) );
Gamyxy[i] = HALF * ( gupxy[i]*gxxy[i]
+ gupyy[i]*gyyx[i]
+ gupyz[i]*(gxzy[i] + gyzx[i] - gxyz[i]) );
Gamzxy[i] = HALF * ( gupxz[i]*gxxy[i]
+ gupyz[i]*gyyx[i]
+ gupzz[i]*(gxzy[i] + gyzx[i] - gxyz[i]) );
Gamxxz[i] = HALF * ( gupxx[i]*gxxz[i]
+ gupxy[i]*(gxyz[i] + gyzx[i] - gxzy[i])
+ gupxz[i]*gzzx[i] );
Gamyxz[i] = HALF * ( gupxy[i]*gxxz[i]
+ gupyy[i]*(gxyz[i] + gyzx[i] - gxzy[i])
+ gupyz[i]*gzzx[i] );
Gamzxz[i] = HALF * ( gupxz[i]*gxxz[i]
+ gupyz[i]*(gxyz[i] + gyzx[i] - gxzy[i])
+ gupzz[i]*gzzx[i] );
Gamxyz[i] = HALF * ( gupxx[i]*(gxyz[i] + gxzy[i] - gyzx[i])
+ gupxy[i]*gyyz[i]
+ gupxz[i]*gzzy[i] );
Gamyyz[i] = HALF * ( gupxy[i]*(gxyz[i] + gxzy[i] - gyzx[i])
+ gupyy[i]*gyyz[i]
+ gupyz[i]*gzzy[i] );
Gamzyz[i] = HALF * ( gupxz[i]*(gxyz[i] + gxzy[i] - gyzx[i])
+ gupyz[i]*gyyz[i]
+ gupzz[i]*gzzy[i] );
}
// 1.8ms //
for (int i=0;i<all;i+=1) {
Rxx[i] = gupxx[i]*gupxx[i]*Axx[i]
+ gupxy[i]*gupxy[i]*Ayy[i] + gupxy[i]*gupxy[i]*Ayy[i]
+ gupxz[i]*gupxz[i]*Azz[i] + gupxz[i]*gupxz[i]*Azz[i]
+ TWO * ( gupxx[i]*gupxy[i]*Axy[i] + TWO * ( gupxx[i]*gupxy[i]*Axy[i]
+ gupxx[i]*gupxz[i]*Axz[i] + gupxx[i]*gupxz[i]*Axz[i]
+ gupxy[i]*gupxz[i]*Ayz[i] ); + gupxy[i]*gupxz[i]*Ayz[i] );
double ayy = gupxy[i]*gupxy[i]*Axx[i]
Ryy[i] = gupxy[i]*gupxy[i]*Axx[i]
+ gupyy[i]*gupyy[i]*Ayy[i] + gupyy[i]*gupyy[i]*Ayy[i]
+ gupyz[i]*gupyz[i]*Azz[i] + gupyz[i]*gupyz[i]*Azz[i]
+ TWO * ( gupxy[i]*gupyy[i]*Axy[i] + TWO * ( gupxy[i]*gupyy[i]*Axy[i]
+ gupxy[i]*gupyz[i]*Axz[i] + gupxy[i]*gupyz[i]*Axz[i]
+ gupyy[i]*gupyz[i]*Ayz[i] ); + gupyy[i]*gupyz[i]*Ayz[i] );
double azz = gupxz[i]*gupxz[i]*Axx[i]
Rzz[i] = gupxz[i]*gupxz[i]*Axx[i]
+ gupyz[i]*gupyz[i]*Ayy[i] + gupyz[i]*gupyz[i]*Ayy[i]
+ gupzz[i]*gupzz[i]*Azz[i] + gupzz[i]*gupzz[i]*Azz[i]
+ TWO * ( gupxz[i]*gupyz[i]*Axy[i] + TWO * ( gupxz[i]*gupyz[i]*Axy[i]
+ gupxz[i]*gupzz[i]*Axz[i] + gupxz[i]*gupzz[i]*Axz[i]
+ gupyz[i]*gupzz[i]*Ayz[i] ); + gupyz[i]*gupzz[i]*Ayz[i] );
double axy = gupxx[i]*gupxy[i]*Axx[i]
Rxy[i] = gupxx[i]*gupxy[i]*Axx[i]
+ gupxy[i]*gupyy[i]*Ayy[i] + gupxy[i]*gupyy[i]*Ayy[i]
+ gupxz[i]*gupyz[i]*Azz[i] + gupxz[i]*gupyz[i]*Azz[i]
+ ( gupxx[i]*gupyy[i] + gupxy[i]*gupxy[i] ) * Axy[i] + ( gupxx[i]*gupyy[i] + gupxy[i]*gupxy[i] ) * Axy[i]
+ ( gupxx[i]*gupyz[i] + gupxz[i]*gupxy[i] ) * Axz[i] + ( gupxx[i]*gupyz[i] + gupxz[i]*gupxy[i] ) * Axz[i]
+ ( gupxy[i]*gupyz[i] + gupxz[i]*gupyy[i] ) * Ayz[i]; + ( gupxy[i]*gupyz[i] + gupxz[i]*gupyy[i] ) * Ayz[i];
double axz = gupxx[i]*gupxz[i]*Axx[i]
Rxz[i] = gupxx[i]*gupxz[i]*Axx[i]
+ gupxy[i]*gupyz[i]*Ayy[i] + gupxy[i]*gupyz[i]*Ayy[i]
+ gupxz[i]*gupzz[i]*Azz[i] + gupxz[i]*gupzz[i]*Azz[i]
+ ( gupxx[i]*gupyz[i] + gupxy[i]*gupxz[i] ) * Axy[i] + ( gupxx[i]*gupyz[i] + gupxy[i]*gupxz[i] ) * Axy[i]
+ ( gupxx[i]*gupzz[i] + gupxz[i]*gupxz[i] ) * Axz[i] + ( gupxx[i]*gupzz[i] + gupxz[i]*gupxz[i] ) * Axz[i]
+ ( gupxy[i]*gupzz[i] + gupxz[i]*gupyz[i] ) * Ayz[i]; + ( gupxy[i]*gupzz[i] + gupxz[i]*gupyz[i] ) * Ayz[i];
double ayz = gupxy[i]*gupxz[i]*Axx[i]
Ryz[i] = gupxy[i]*gupxz[i]*Axx[i]
+ gupyy[i]*gupyz[i]*Ayy[i] + gupyy[i]*gupyz[i]*Ayy[i]
+ gupyz[i]*gupzz[i]*Azz[i] + gupyz[i]*gupzz[i]*Azz[i]
+ ( gupxy[i]*gupyz[i] + gupyy[i]*gupxz[i] ) * Axy[i] + ( gupxy[i]*gupyz[i] + gupyy[i]*gupxz[i] ) * Axy[i]
+ ( gupxy[i]*gupzz[i] + gupyz[i]*gupxz[i] ) * Axz[i] + ( gupxy[i]*gupzz[i] + gupyz[i]*gupxz[i] ) * Axz[i]
+ ( gupyy[i]*gupzz[i] + gupyz[i]*gupyz[i] ) * Ayz[i]; + ( gupyy[i]*gupzz[i] + gupyz[i]*gupyz[i] ) * Ayz[i];
Rxx[i] = axx; Ryy[i] = ayy; Rzz[i] = azz; }
Rxy[i] = axy; Rxz[i] = axz; Ryz[i] = ayz; // 4ms //
for(int i=0;i<all;i+=1){
Gamx_rhs[i] = - TWO * ( Lapx[i] * Rxx[i] + Lapy[i] * Rxy[i] + Lapz[i] * Rxz[i] ) +
TWO * alpn1[i] * (
-F3o2/chin1[i] * ( chix[i] * Rxx[i] + chiy[i] * Rxy[i] + chiz[i] * Rxz[i] ) -
gupxx[i] * ( F2o3 * Kx[i] + EIGHT * PI * Sx[i] ) -
gupxy[i] * ( F2o3 * Ky[i] + EIGHT * PI * Sy[i] ) -
gupxz[i] * ( F2o3 * Kz[i] + EIGHT * PI * Sz[i] ) +
Gamxxx[i] * Rxx[i] + Gamxyy[i] * Ryy[i] + Gamxzz[i] * Rzz[i] +
TWO * ( Gamxxy[i] * Rxy[i] + Gamxxz[i] * Rxz[i] + Gamxyz[i] * Ryz[i] ) );
Gamx_rhs[i] = - TWO * ( Lapx[i]*axx + Lapy[i]*axy + Lapz[i]*axz ) + Gamy_rhs[i] = -TWO * ( Lapx[i]*Rxy[i] + Lapy[i]*Ryy[i] + Lapz[i]*Ryz[i] )
TWO * alpn1[i] * (
-F3o2/chin1[i] * ( chix[i]*axx + chiy[i]*axy + chiz[i]*axz ) -
gupxx[i] * ( F2o3*Kx[i] + EIGHT*PI*Sx[i] ) -
gupxy[i] * ( F2o3*Ky[i] + EIGHT*PI*Sy[i] ) -
gupxz[i] * ( F2o3*Kz[i] + EIGHT*PI*Sz[i] ) +
Gamxxx[i]*axx + Gamxyy[i]*ayy + Gamxzz[i]*azz +
TWO * ( Gamxxy[i]*axy + Gamxxz[i]*axz + Gamxyz[i]*ayz ) );
Gamy_rhs[i] = -TWO * ( Lapx[i]*axy + Lapy[i]*ayy + Lapz[i]*ayz )
+ TWO * alpn1[i] * ( + TWO * alpn1[i] * (
-F3o2/chin1[i] * ( chix[i]*axy + chiy[i]*ayy + chiz[i]*ayz ) -F3o2/chin1[i] * ( chix[i]*Rxy[i] + chiy[i]*Ryy[i] + chiz[i]*Ryz[i] )
- gupxy[i] * ( F2o3*Kx[i] + EIGHT*PI*Sx[i] ) - gupxy[i] * ( F2o3*Kx[i] + EIGHT*PI*Sx[i] )
- gupyy[i] * ( F2o3*Ky[i] + EIGHT*PI*Sy[i] ) - gupyy[i] * ( F2o3*Ky[i] + EIGHT*PI*Sy[i] )
- gupyz[i] * ( F2o3*Kz[i] + EIGHT*PI*Sz[i] ) - gupyz[i] * ( F2o3*Kz[i] + EIGHT*PI*Sz[i] )
+ Gamyxx[i]*axx + Gamyyy[i]*ayy + Gamyzz[i]*azz + Gamyxx[i]*Rxx[i] + Gamyyy[i]*Ryy[i] + Gamyzz[i]*Rzz[i]
+ TWO * ( Gamyxy[i]*axy + Gamyxz[i]*axz + Gamyyz[i]*ayz ) + TWO * ( Gamyxy[i]*Rxy[i] + Gamyxz[i]*Rxz[i] + Gamyyz[i]*Ryz[i] )
); );
Gamz_rhs[i] = -TWO * ( Lapx[i]*axz + Lapy[i]*ayz + Lapz[i]*azz ) Gamz_rhs[i] = -TWO * ( Lapx[i]*Rxz[i] + Lapy[i]*Ryz[i] + Lapz[i]*Rzz[i] )
+ TWO * alpn1[i] * ( + TWO * alpn1[i] * (
-F3o2/chin1[i] * ( chix[i]*axz + chiy[i]*ayz + chiz[i]*azz ) -F3o2/chin1[i] * ( chix[i]*Rxz[i] + chiy[i]*Ryz[i] + chiz[i]*Rzz[i] )
- gupxz[i] * ( F2o3*Kx[i] + EIGHT*PI*Sx[i] ) - gupxz[i] * ( F2o3*Kx[i] + EIGHT*PI*Sx[i] )
- gupyz[i] * ( F2o3*Ky[i] + EIGHT*PI*Sy[i] ) - gupyz[i] * ( F2o3*Ky[i] + EIGHT*PI*Sy[i] )
- gupzz[i] * ( F2o3*Kz[i] + EIGHT*PI*Sz[i] ) - gupzz[i] * ( F2o3*Kz[i] + EIGHT*PI*Sz[i] )
+ Gamzxx[i]*axx + Gamzyy[i]*ayy + Gamzzz[i]*azz + Gamzxx[i]*Rxx[i] + Gamzyy[i]*Ryy[i] + Gamzzz[i]*Rzz[i]
+ TWO * ( Gamzxy[i]*axy + Gamzxz[i]*axz + Gamzyz[i]*ayz ) + TWO * ( Gamzxy[i]*Rxy[i] + Gamzxz[i]*Rxz[i] + Gamzyz[i]*Ryz[i] )
); );
} }
// 22.3ms // // 22.3ms //
@@ -326,63 +365,65 @@ int f_compute_rhs_bssn(int *ex, double &T,
fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev); fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev);
fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev); fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev);
// Fused: fxx/Gamxa + Gamma_rhs part 2 (2 loops -> 1) // 3.5ms //
for(int i=0;i<all;i+=1){ for(int i=0;i<all;i+=1){
const double divb = betaxx[i] + betayy[i] + betazz[i]; fxx[i] = gxxx[i] + gxyy[i] + gxzz[i];
double lfxx = gxxx[i] + gxyy[i] + gxzz[i]; fxy[i] = gxyx[i] + gyyy[i] + gyzz[i];
double lfxy = gxyx[i] + gyyy[i] + gyzz[i]; fxz[i] = gxzx[i] + gyzy[i] + gzzz[i];
double lfxz = gxzx[i] + gyzy[i] + gzzz[i]; Gamxa[i] = gupxx[i]*Gamxxx[i] + gupyy[i]*Gamxyy[i] + gupzz[i]*Gamxzz[i]
fxx[i] = lfxx; fxy[i] = lfxy; fxz[i] = lfxz;
double gxa = gupxx[i]*Gamxxx[i] + gupyy[i]*Gamxyy[i] + gupzz[i]*Gamxzz[i]
+ TWO * ( gupxy[i]*Gamxxy[i] + gupxz[i]*Gamxxz[i] + gupyz[i]*Gamxyz[i] ); + TWO * ( gupxy[i]*Gamxxy[i] + gupxz[i]*Gamxxz[i] + gupyz[i]*Gamxyz[i] );
double gya = gupxx[i]*Gamyxx[i] + gupyy[i]*Gamyyy[i] + gupzz[i]*Gamyzz[i]
Gamya[i] = gupxx[i]*Gamyxx[i] + gupyy[i]*Gamyyy[i] + gupzz[i]*Gamyzz[i]
+ TWO * ( gupxy[i]*Gamyxy[i] + gupxz[i]*Gamyxz[i] + gupyz[i]*Gamyyz[i] ); + TWO * ( gupxy[i]*Gamyxy[i] + gupxz[i]*Gamyxz[i] + gupyz[i]*Gamyyz[i] );
double gza = gupxx[i]*Gamzxx[i] + gupyy[i]*Gamzyy[i] + gupzz[i]*Gamzzz[i]
Gamza[i] = gupxx[i]*Gamzxx[i] + gupyy[i]*Gamzyy[i] + gupzz[i]*Gamzzz[i]
+ TWO * ( gupxy[i]*Gamzxy[i] + gupxz[i]*Gamzxz[i] + gupyz[i]*Gamzyz[i] ); + TWO * ( gupxy[i]*Gamzxy[i] + gupxz[i]*Gamzxz[i] + gupyz[i]*Gamzyz[i] );
}
// 3.9ms //
for(int i=0;i<all;i+=1){
Gamx_rhs[i] = Gamx_rhs[i] Gamx_rhs[i] = Gamx_rhs[i]
+ F2o3 * gxa * divb + F2o3 * Gamxa[i] * div_beta[i]
- gxa * betaxx[i] - gya * betaxy[i] - gza * betaxz[i] - Gamxa[i] * betaxx[i] - Gamya[i] * betaxy[i] - Gamza[i] * betaxz[i]
+ F1o3 * ( gupxx[i] * lfxx + gupxy[i] * lfxy + gupxz[i] * lfxz ) + F1o3 * ( gupxx[i] * fxx[i] + gupxy[i] * fxy[i] + gupxz[i] * fxz[i] )
+ gupxx[i] * gxxx[i] + gupyy[i] * gyyx[i] + gupzz[i] * gzzx[i] + gupxx[i] * gxxx[i] + gupyy[i] * gyyx[i] + gupzz[i] * gzzx[i]
+ TWO * ( gupxy[i] * gxyx[i] + gupxz[i] * gxzx[i] + gupyz[i] * gyzx[i] ); + TWO * ( gupxy[i] * gxyx[i] + gupxz[i] * gxzx[i] + gupyz[i] * gyzx[i] );
Gamy_rhs[i] = Gamy_rhs[i] Gamy_rhs[i] = Gamy_rhs[i]
+ F2o3 * gya * divb + F2o3 * Gamya[i] * div_beta[i]
- gxa * betayx[i] - gya * betayy[i] - gza * betayz[i] - Gamxa[i] * betayx[i] - Gamya[i] * betayy[i] - Gamza[i] * betayz[i]
+ F1o3 * ( gupxy[i] * lfxx + gupyy[i] * lfxy + gupyz[i] * lfxz ) + F1o3 * ( gupxy[i] * fxx[i] + gupyy[i] * fxy[i] + gupyz[i] * fxz[i] )
+ gupxx[i] * gxxy[i] + gupyy[i] * gyyy[i] + gupzz[i] * gzzy[i] + gupxx[i] * gxxy[i] + gupyy[i] * gyyy[i] + gupzz[i] * gzzy[i]
+ TWO * ( gupxy[i] * gxyy[i] + gupxz[i] * gxzy[i] + gupyz[i] * gyzy[i] ); + TWO * ( gupxy[i] * gxyy[i] + gupxz[i] * gxzy[i] + gupyz[i] * gyzy[i] );
Gamz_rhs[i] = Gamz_rhs[i] Gamz_rhs[i] = Gamz_rhs[i]
+ F2o3 * gza * divb + F2o3 * Gamza[i] * div_beta[i]
- gxa * betazx[i] - gya * betazy[i] - gza * betazz[i] - Gamxa[i] * betazx[i] - Gamya[i] * betazy[i] - Gamza[i] * betazz[i]
+ F1o3 * ( gupxz[i] * lfxx + gupyz[i] * lfxy + gupzz[i] * lfxz ) + F1o3 * ( gupxz[i] * fxx[i] + gupyz[i] * fxy[i] + gupzz[i] * fxz[i] )
+ gupxx[i] * gxxz[i] + gupyy[i] * gyyz[i] + gupzz[i] * gzzz[i] + gupxx[i] * gxxz[i] + gupyy[i] * gyyz[i] + gupzz[i] * gzzz[i]
+ TWO * ( gupxy[i] * gxyz[i] + gupxz[i] * gxzz[i] + gupyz[i] * gyzz[i] ); + TWO * ( gupxy[i] * gxyz[i] + gupxz[i] * gxzz[i] + gupyz[i] * gyzz[i] );
} }
// 4.4ms // // 4.4ms //
for (int i=0;i<all;i+=1) { for (int i=0;i<all;i+=1) {
gxxx[i] = (dxx[i] + ONE)*Gamxxx[i] + gxy[i]*Gamyxx[i] + gxz[i]*Gamzxx[i]; gxxx[i] = gxx[i]*Gamxxx[i] + gxy[i]*Gamyxx[i] + gxz[i]*Gamzxx[i];
gxyx[i] = (dxx[i] + ONE)*Gamxxy[i] + gxy[i]*Gamyxy[i] + gxz[i]*Gamzxy[i]; gxyx[i] = gxx[i]*Gamxxy[i] + gxy[i]*Gamyxy[i] + gxz[i]*Gamzxy[i];
gxzx[i] = (dxx[i] + ONE)*Gamxxz[i] + gxy[i]*Gamyxz[i] + gxz[i]*Gamzxz[i]; gxzx[i] = gxx[i]*Gamxxz[i] + gxy[i]*Gamyxz[i] + gxz[i]*Gamzxz[i];
gyyx[i] = (dxx[i] + ONE)*Gamxyy[i] + gxy[i]*Gamyyy[i] + gxz[i]*Gamzyy[i]; gyyx[i] = gxx[i]*Gamxyy[i] + gxy[i]*Gamyyy[i] + gxz[i]*Gamzyy[i];
gyzx[i] = (dxx[i] + ONE)*Gamxyz[i] + gxy[i]*Gamyyz[i] + gxz[i]*Gamzyz[i]; gyzx[i] = gxx[i]*Gamxyz[i] + gxy[i]*Gamyyz[i] + gxz[i]*Gamzyz[i];
gzzx[i] = (dxx[i] + ONE)*Gamxzz[i] + gxy[i]*Gamyzz[i] + gxz[i]*Gamzzz[i]; gzzx[i] = gxx[i]*Gamxzz[i] + gxy[i]*Gamyzz[i] + gxz[i]*Gamzzz[i];
gxxy[i] = gxy[i]*Gamxxx[i] + (dyy[i] + ONE)*Gamyxx[i] + gyz[i]*Gamzxx[i]; gxxy[i] = gxy[i]*Gamxxx[i] + gyy[i]*Gamyxx[i] + gyz[i]*Gamzxx[i];
gxyy[i] = gxy[i]*Gamxxy[i] + (dyy[i] + ONE)*Gamyxy[i] + gyz[i]*Gamzxy[i]; gxyy[i] = gxy[i]*Gamxxy[i] + gyy[i]*Gamyxy[i] + gyz[i]*Gamzxy[i];
gxzy[i] = gxy[i]*Gamxxz[i] + (dyy[i] + ONE)*Gamyxz[i] + gyz[i]*Gamzxz[i]; gxzy[i] = gxy[i]*Gamxxz[i] + gyy[i]*Gamyxz[i] + gyz[i]*Gamzxz[i];
gyyy[i] = gxy[i]*Gamxyy[i] + (dyy[i] + ONE)*Gamyyy[i] + gyz[i]*Gamzyy[i]; gyyy[i] = gxy[i]*Gamxyy[i] + gyy[i]*Gamyyy[i] + gyz[i]*Gamzyy[i];
gyzy[i] = gxy[i]*Gamxyz[i] + (dyy[i] + ONE)*Gamyyz[i] + gyz[i]*Gamzyz[i]; gyzy[i] = gxy[i]*Gamxyz[i] + gyy[i]*Gamyyz[i] + gyz[i]*Gamzyz[i];
gzzy[i] = gxy[i]*Gamxzz[i] + (dyy[i] + ONE)*Gamyzz[i] + gyz[i]*Gamzzz[i]; gzzy[i] = gxy[i]*Gamxzz[i] + gyy[i]*Gamyzz[i] + gyz[i]*Gamzzz[i];
gxxz[i] = gxz[i]*Gamxxx[i] + gyz[i]*Gamyxx[i] + (dzz[i] + ONE)*Gamzxx[i]; gxxz[i] = gxz[i]*Gamxxx[i] + gyz[i]*Gamyxx[i] + gzz[i]*Gamzxx[i];
gxyz[i] = gxz[i]*Gamxxy[i] + gyz[i]*Gamyxy[i] + (dzz[i] + ONE)*Gamzxy[i]; gxyz[i] = gxz[i]*Gamxxy[i] + gyz[i]*Gamyxy[i] + gzz[i]*Gamzxy[i];
gxzz[i] = gxz[i]*Gamxxz[i] + gyz[i]*Gamyxz[i] + (dzz[i] + ONE)*Gamzxz[i]; gxzz[i] = gxz[i]*Gamxxz[i] + gyz[i]*Gamyxz[i] + gzz[i]*Gamzxz[i];
gyyz[i] = gxz[i]*Gamxyy[i] + gyz[i]*Gamyyy[i] + (dzz[i] + ONE)*Gamzyy[i]; gyyz[i] = gxz[i]*Gamxyy[i] + gyz[i]*Gamyyy[i] + gzz[i]*Gamzyy[i];
gyzz[i] = gxz[i]*Gamxyz[i] + gyz[i]*Gamyyz[i] + (dzz[i] + ONE)*Gamzyz[i]; gyzz[i] = gxz[i]*Gamxyz[i] + gyz[i]*Gamyyz[i] + gzz[i]*Gamzyz[i];
gzzz[i] = gxz[i]*Gamxzz[i] + gyz[i]*Gamyzz[i] + (dzz[i] + ONE)*Gamzzz[i]; gzzz[i] = gxz[i]*Gamxzz[i] + gyz[i]*Gamyzz[i] + gzz[i]*Gamzzz[i];
} }
// 22.2ms // // 22.2ms //
fdderivs(ex,dxx,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev); fdderivs(ex,dxx,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev);
@@ -430,17 +471,10 @@ int f_compute_rhs_bssn(int *ex, double &T,
// 14ms // // 14ms //
/* 假设 all = ex1*ex2*ex3所有量都是 length=all 的 double 数组(已按同一扁平化规则排布) */ /* 假设 all = ex1*ex2*ex3所有量都是 length=all 的 double 数组(已按同一扁平化规则排布) */
for (int i = 0; i < all; i += 1) { for (int i = 0; i < all; i += 1) {
const double gxa = gupxx[i]*Gamxxx[i] + gupyy[i]*Gamxyy[i] + gupzz[i]*Gamxzz[i]
+ TWO * ( gupxy[i]*Gamxxy[i] + gupxz[i]*Gamxxz[i] + gupyz[i]*Gamxyz[i] );
const double gya = gupxx[i]*Gamyxx[i] + gupyy[i]*Gamyyy[i] + gupzz[i]*Gamyzz[i]
+ TWO * ( gupxy[i]*Gamyxy[i] + gupxz[i]*Gamyxz[i] + gupyz[i]*Gamyyz[i] );
const double gza = gupxx[i]*Gamzxx[i] + gupyy[i]*Gamzyy[i] + gupzz[i]*Gamzzz[i]
+ TWO * ( gupxy[i]*Gamzxy[i] + gupxz[i]*Gamzxz[i] + gupyz[i]*Gamzyz[i] );
Rxx[i] = Rxx[i] =
-HALF * Rxx[i] -HALF * Rxx[i]
+ (dxx[i] + ONE) * Gamxx[i] + gxy[i] * Gamyx[i] + gxz[i] * Gamzx[i] + gxx[i] * Gamxx[i] + gxy[i] * Gamyx[i] + gxz[i] * Gamzx[i]
+ gxa * gxxx[i] + gya * gxyx[i] + gza * gxzx[i] + Gamxa[i] * gxxx[i] + Gamya[i] * gxyx[i] + Gamza[i] * gxzx[i]
+ gupxx[i] * ( + gupxx[i] * (
TWO * (Gamxxx[i] * gxxx[i] + Gamyxx[i] * gxyx[i] + Gamzxx[i] * gxzx[i]) + TWO * (Gamxxx[i] * gxxx[i] + Gamyxx[i] * gxyx[i] + Gamzxx[i] * gxzx[i]) +
(Gamxxx[i] * gxxx[i] + Gamyxx[i] * gxxy[i] + Gamzxx[i] * gxxz[i]) (Gamxxx[i] * gxxx[i] + Gamyxx[i] * gxxy[i] + Gamzxx[i] * gxxz[i])
@@ -474,8 +508,8 @@ int f_compute_rhs_bssn(int *ex, double &T,
Ryy[i] = Ryy[i] =
-HALF * Ryy[i] -HALF * Ryy[i]
+ gxy[i] * Gamxy[i] + (dyy[i] + ONE) * Gamyy[i] + gyz[i] * Gamzy[i] + gxy[i] * Gamxy[i] + gyy[i] * Gamyy[i] + gyz[i] * Gamzy[i]
+ gxa * gxyy[i] + gya * gyyy[i] + gza * gyzy[i] + Gamxa[i] * gxyy[i] + Gamya[i] * gyyy[i] + Gamza[i] * gyzy[i]
+ gupxx[i] * ( + gupxx[i] * (
TWO * (Gamxxy[i] * gxxy[i] + Gamyxy[i] * gxyy[i] + Gamzxy[i] * gxzy[i]) + TWO * (Gamxxy[i] * gxxy[i] + Gamyxy[i] * gxyy[i] + Gamzxy[i] * gxzy[i]) +
(Gamxxy[i] * gxyx[i] + Gamyxy[i] * gxyy[i] + Gamzxy[i] * gxyz[i]) (Gamxxy[i] * gxyx[i] + Gamyxy[i] * gxyy[i] + Gamzxy[i] * gxyz[i])
@@ -509,8 +543,8 @@ int f_compute_rhs_bssn(int *ex, double &T,
Rzz[i] = Rzz[i] =
-HALF * Rzz[i] -HALF * Rzz[i]
+ gxz[i] * Gamxz[i] + gyz[i] * Gamyz[i] + (dzz[i] + ONE) * Gamzz[i] + gxz[i] * Gamxz[i] + gyz[i] * Gamyz[i] + gzz[i] * Gamzz[i]
+ gxa * gxzz[i] + gya * gyzz[i] + gza * gzzz[i] + Gamxa[i] * gxzz[i] + Gamya[i] * gyzz[i] + Gamza[i] * gzzz[i]
+ gupxx[i] * ( + gupxx[i] * (
TWO * (Gamxxz[i] * gxxz[i] + Gamyxz[i] * gxyz[i] + Gamzxz[i] * gxzz[i]) + TWO * (Gamxxz[i] * gxxz[i] + Gamyxz[i] * gxyz[i] + Gamzxz[i] * gxzz[i]) +
(Gamxxz[i] * gxzx[i] + Gamyxz[i] * gxzy[i] + Gamzxz[i] * gxzz[i]) (Gamxxz[i] * gxzx[i] + Gamyxz[i] * gxzy[i] + Gamzxz[i] * gxzz[i])
@@ -545,10 +579,10 @@ int f_compute_rhs_bssn(int *ex, double &T,
Rxy[i] = Rxy[i] =
HALF * ( HALF * (
-Rxy[i] -Rxy[i]
+ (dxx[i] + ONE) * Gamxy[i] + gxy[i] * Gamyy[i] + gxz[i] * Gamzy[i] + gxx[i] * Gamxy[i] + gxy[i] * Gamyy[i] + gxz[i] * Gamzy[i]
+ gxy[i] * Gamxx[i] + (dyy[i] + ONE) * Gamyx[i] + gyz[i] * Gamzx[i] + gxy[i] * Gamxx[i] + gyy[i] * Gamyx[i] + gyz[i] * Gamzx[i]
+ gxa * gxyx[i] + gya * gyyx[i] + gza * gyzx[i] + Gamxa[i] * gxyx[i] + Gamya[i] * gyyx[i] + Gamza[i] * gyzx[i]
+ gxa * gxxy[i] + gya * gxyy[i] + gza * gxzy[i] + Gamxa[i] * gxxy[i] + Gamya[i] * gxyy[i] + Gamza[i] * gxzy[i]
) )
+ gupxx[i] * ( + gupxx[i] * (
Gamxxx[i] * gxxy[i] + Gamyxx[i] * gxyy[i] + Gamzxx[i] * gxzy[i] Gamxxx[i] * gxxy[i] + Gamyxx[i] * gxyy[i] + Gamzxx[i] * gxzy[i]
@@ -593,10 +627,10 @@ int f_compute_rhs_bssn(int *ex, double &T,
Rxz[i] = Rxz[i] =
HALF * ( HALF * (
-Rxz[i] -Rxz[i]
+ (dxx[i] + ONE) * Gamxz[i] + gxy[i] * Gamyz[i] + gxz[i] * Gamzz[i] + gxx[i] * Gamxz[i] + gxy[i] * Gamyz[i] + gxz[i] * Gamzz[i]
+ gxz[i] * Gamxx[i] + gyz[i] * Gamyx[i] + (dzz[i] + ONE) * Gamzx[i] + gxz[i] * Gamxx[i] + gyz[i] * Gamyx[i] + gzz[i] * Gamzx[i]
+ gxa * gxzx[i] + gya * gyzx[i] + gza * gzzx[i] + Gamxa[i] * gxzx[i] + Gamya[i] * gyzx[i] + Gamza[i] * gzzx[i]
+ gxa * gxxz[i] + gya * gxyz[i] + gza * gxzz[i] + Gamxa[i] * gxxz[i] + Gamya[i] * gxyz[i] + Gamza[i] * gxzz[i]
) )
+ gupxx[i] * ( + gupxx[i] * (
Gamxxx[i] * gxxz[i] + Gamyxx[i] * gxyz[i] + Gamzxx[i] * gxzz[i] Gamxxx[i] * gxxz[i] + Gamyxx[i] * gxyz[i] + Gamzxx[i] * gxzz[i]
@@ -641,10 +675,10 @@ int f_compute_rhs_bssn(int *ex, double &T,
Ryz[i] = Ryz[i] =
HALF * ( HALF * (
-Ryz[i] -Ryz[i]
+ gxy[i] * Gamxz[i] + (dyy[i] + ONE) * Gamyz[i] + gyz[i] * Gamzz[i] + gxy[i] * Gamxz[i] + gyy[i] * Gamyz[i] + gyz[i] * Gamzz[i]
+ gxz[i] * Gamxy[i] + gyz[i] * Gamyy[i] + (dzz[i] + ONE) * Gamzy[i] + gxz[i] * Gamxy[i] + gyz[i] * Gamyy[i] + gzz[i] * Gamzy[i]
+ gxa * gxzy[i] + gya * gyzy[i] + gza * gzzy[i] + Gamxa[i] * gxzy[i] + Gamya[i] * gyzy[i] + Gamza[i] * gzzy[i]
+ gxa * gxyz[i] + gya * gyyz[i] + gza * gyzz[i] + Gamxa[i] * gxyz[i] + Gamya[i] * gyyz[i] + Gamza[i] * gyzz[i]
) )
+ gupxx[i] * ( + gupxx[i] * (
Gamxxy[i] * gxxz[i] + Gamyxy[i] * gxyz[i] + Gamzxy[i] * gxzz[i] Gamxxy[i] * gxxz[i] + Gamyxy[i] * gxyz[i] + Gamzxy[i] * gxzz[i]
@@ -705,9 +739,9 @@ int f_compute_rhs_bssn(int *ex, double &T,
+ TWO * gupxy[i] * (fxy[i] - (F3o2 / chin1[i]) * chix[i] * chiy[i]) + TWO * gupxy[i] * (fxy[i] - (F3o2 / chin1[i]) * chix[i] * chiy[i])
+ TWO * gupxz[i] * (fxz[i] - (F3o2 / chin1[i]) * chix[i] * chiz[i]) + TWO * gupxz[i] * (fxz[i] - (F3o2 / chin1[i]) * chix[i] * chiz[i])
+ TWO * gupyz[i] * (fyz[i] - (F3o2 / chin1[i]) * chiy[i] * chiz[i]); + TWO * gupyz[i] * (fyz[i] - (F3o2 / chin1[i]) * chiy[i] * chiz[i]);
Rxx[i] = Rxx[i] + ( fxx[i] - (chix[i] * chix[i]) / (chin1[i] * TWO) + (dxx[i] + ONE) * f[i] ) / (chin1[i] * TWO); Rxx[i] = Rxx[i] + ( fxx[i] - (chix[i] * chix[i]) / (chin1[i] * TWO) + gxx[i] * f[i] ) / (chin1[i] * TWO);
Ryy[i] = Ryy[i] + ( fyy[i] - (chiy[i] * chiy[i]) / (chin1[i] * TWO) + (dyy[i] + ONE) * f[i] ) / (chin1[i] * TWO); Ryy[i] = Ryy[i] + ( fyy[i] - (chiy[i] * chiy[i]) / (chin1[i] * TWO) + gyy[i] * f[i] ) / (chin1[i] * TWO);
Rzz[i] = Rzz[i] + ( fzz[i] - (chiz[i] * chiz[i]) / (chin1[i] * TWO) + (dzz[i] + ONE) * f[i] ) / (chin1[i] * TWO); Rzz[i] = Rzz[i] + ( fzz[i] - (chiz[i] * chiz[i]) / (chin1[i] * TWO) + gzz[i] * f[i] ) / (chin1[i] * TWO);
Rxy[i] = Rxy[i] + ( fxy[i] - (chix[i] * chiy[i]) / (chin1[i] * TWO) + gxy[i] * f[i] ) / (chin1[i] * TWO); Rxy[i] = Rxy[i] + ( fxy[i] - (chix[i] * chiy[i]) / (chin1[i] * TWO) + gxy[i] * f[i] ) / (chin1[i] * TWO);
Rxz[i] = Rxz[i] + ( fxz[i] - (chix[i] * chiz[i]) / (chin1[i] * TWO) + gxz[i] * f[i] ) / (chin1[i] * TWO); Rxz[i] = Rxz[i] + ( fxz[i] - (chix[i] * chiz[i]) / (chin1[i] * TWO) + gxz[i] * f[i] ) / (chin1[i] * TWO);
@@ -726,17 +760,17 @@ int f_compute_rhs_bssn(int *ex, double &T,
gxxz[i] = (gupxz[i] * chix[i] + gupyz[i] * chiy[i] + gupzz[i] * chiz[i]) / chin1[i]; gxxz[i] = (gupxz[i] * chix[i] + gupyz[i] * chiy[i] + gupzz[i] * chiz[i]) / chin1[i];
/* Christoffel 修正项 */ /* Christoffel 修正项 */
Gamxxx[i] = Gamxxx[i] - ( ((chix[i] + chix[i]) / chin1[i]) - (dxx[i] + ONE) * gxxx[i] ) * HALF; Gamxxx[i] = Gamxxx[i] - ( ((chix[i] + chix[i]) / chin1[i]) - gxx[i] * gxxx[i] ) * HALF;
Gamyxx[i] = Gamyxx[i] - ( 0.0 - (dxx[i] + ONE) * gxxy[i] ) * HALF; /* 原式只有 -gxx*gxxy */ Gamyxx[i] = Gamyxx[i] - ( 0.0 - gxx[i] * gxxy[i] ) * HALF; /* 原式只有 -gxx*gxxy */
Gamzxx[i] = Gamzxx[i] - ( 0.0 - (dxx[i] + ONE) * gxxz[i] ) * HALF; Gamzxx[i] = Gamzxx[i] - ( 0.0 - gxx[i] * gxxz[i] ) * HALF;
Gamxyy[i] = Gamxyy[i] - ( 0.0 - (dyy[i] + ONE) * gxxx[i] ) * HALF; Gamxyy[i] = Gamxyy[i] - ( 0.0 - gyy[i] * gxxx[i] ) * HALF;
Gamyyy[i] = Gamyyy[i] - ( ((chiy[i] + chiy[i]) / chin1[i]) - (dyy[i] + ONE) * gxxy[i] ) * HALF; Gamyyy[i] = Gamyyy[i] - ( ((chiy[i] + chiy[i]) / chin1[i]) - gyy[i] * gxxy[i] ) * HALF;
Gamzyy[i] = Gamzyy[i] - ( 0.0 - (dyy[i] + ONE) * gxxz[i] ) * HALF; Gamzyy[i] = Gamzyy[i] - ( 0.0 - gyy[i] * gxxz[i] ) * HALF;
Gamxzz[i] = Gamxzz[i] - ( 0.0 - (dzz[i] + ONE) * gxxx[i] ) * HALF; Gamxzz[i] = Gamxzz[i] - ( 0.0 - gzz[i] * gxxx[i] ) * HALF;
Gamyzz[i] = Gamyzz[i] - ( 0.0 - (dzz[i] + ONE) * gxxy[i] ) * HALF; Gamyzz[i] = Gamyzz[i] - ( 0.0 - gzz[i] * gxxy[i] ) * HALF;
Gamzzz[i] = Gamzzz[i] - ( ((chiz[i] + chiz[i]) / chin1[i]) - (dzz[i] + ONE) * gxxz[i] ) * HALF; Gamzzz[i] = Gamzzz[i] - ( ((chiz[i] + chiz[i]) / chin1[i]) - gzz[i] * gxxz[i] ) * HALF;
Gamxxy[i] = Gamxxy[i] - ( ( chiy[i] / chin1[i]) - gxy[i] * gxxx[i] ) * HALF; Gamxxy[i] = Gamxxy[i] - ( ( chiy[i] / chin1[i]) - gxy[i] * gxxx[i] ) * HALF;
Gamyxy[i] = Gamyxy[i] - ( ( chix[i] / chin1[i]) - gxy[i] * gxxy[i] ) * HALF; Gamyxy[i] = Gamyxy[i] - ( ( chix[i] / chin1[i]) - gxy[i] * gxxy[i] ) * HALF;
@@ -758,13 +792,14 @@ int f_compute_rhs_bssn(int *ex, double &T,
fxy[i] = fxy[i] - Gamxxy[i] * Lapx[i] - Gamyxy[i] * Lapy[i] - Gamzxy[i] * Lapz[i]; fxy[i] = fxy[i] - Gamxxy[i] * Lapx[i] - Gamyxy[i] * Lapy[i] - Gamzxy[i] * Lapz[i];
fxz[i] = fxz[i] - Gamxxz[i] * Lapx[i] - Gamyxz[i] * Lapy[i] - Gamzxz[i] * Lapz[i]; fxz[i] = fxz[i] - Gamxxz[i] * Lapx[i] - Gamyxz[i] * Lapy[i] - Gamzxz[i] * Lapz[i];
fyz[i] = fyz[i] - Gamxyz[i] * Lapx[i] - Gamyyz[i] * Lapy[i] - Gamzyz[i] * Lapz[i]; fyz[i] = fyz[i] - Gamxyz[i] * Lapx[i] - Gamyyz[i] * Lapy[i] - Gamzyz[i] * Lapz[i];
}
// 1ms //
for (int i=0;i<all;i+=1) {
trK_rhs[i] = gupxx[i] * fxx[i] + gupyy[i] * fyy[i] + gupzz[i] * fzz[i] trK_rhs[i] = gupxx[i] * fxx[i] + gupyy[i] * fyy[i] + gupzz[i] * fzz[i]
+ TWO * ( gupxy[i] * fxy[i] + gupxz[i] * fxz[i] + gupyz[i] * fyz[i] ); + TWO * ( gupxy[i] * fxy[i] + gupxz[i] * fxz[i] + gupyz[i] * fyz[i] );
} }
// 2.5ms // // 2.5ms //
for (int i=0;i<all;i+=1) { for (int i=0;i<all;i+=1) {
const double divb = betaxx[i] + betayy[i] + betazz[i];
S[i] = chin1[i] * ( S[i] = chin1[i] * (
gupxx[i] * Sxx[i] + gupyy[i] * Syy[i] + gupzz[i] * Szz[i] gupxx[i] * Sxx[i] + gupyy[i] * Syy[i] + gupzz[i] * Szz[i]
@@ -815,20 +850,23 @@ int f_compute_rhs_bssn(int *ex, double &T,
+ (alpn1[i] / chin1[i]) * f[i] + (alpn1[i] / chin1[i]) * f[i]
); );
double l_fxx = alpn1[i] * (Rxx[i] - EIGHT * PI * Sxx[i]) - fxx[i]; fxx[i] = alpn1[i] * (Rxx[i] - EIGHT * PI * Sxx[i]) - fxx[i];
double l_fxy = alpn1[i] * (Rxy[i] - EIGHT * PI * Sxy[i]) - fxy[i]; fxy[i] = alpn1[i] * (Rxy[i] - EIGHT * PI * Sxy[i]) - fxy[i];
double l_fxz = alpn1[i] * (Rxz[i] - EIGHT * PI * Sxz[i]) - fxz[i]; fxz[i] = alpn1[i] * (Rxz[i] - EIGHT * PI * Sxz[i]) - fxz[i];
double l_fyy = alpn1[i] * (Ryy[i] - EIGHT * PI * Syy[i]) - fyy[i]; fyy[i] = alpn1[i] * (Ryy[i] - EIGHT * PI * Syy[i]) - fyy[i];
double l_fyz = alpn1[i] * (Ryz[i] - EIGHT * PI * Syz[i]) - fyz[i]; fyz[i] = alpn1[i] * (Ryz[i] - EIGHT * PI * Syz[i]) - fyz[i];
double l_fzz = alpn1[i] * (Rzz[i] - EIGHT * PI * Szz[i]) - fzz[i]; fzz[i] = alpn1[i] * (Rzz[i] - EIGHT * PI * Szz[i]) - fzz[i];
}
// 8ms //
for (int i=0;i<all;i+=1) {
/* Aij_rhs = fij - gij * f */ /* Aij_rhs = fij - gij * f */
Axx_rhs[i] = l_fxx - (dxx[i] + ONE) * f[i]; Axx_rhs[i] = fxx[i] - gxx[i] * f[i];
Ayy_rhs[i] = l_fyy - (dyy[i] + ONE) * f[i]; Ayy_rhs[i] = fyy[i] - gyy[i] * f[i];
Azz_rhs[i] = l_fzz - (dzz[i] + ONE) * f[i]; Azz_rhs[i] = fzz[i] - gzz[i] * f[i];
Axy_rhs[i] = l_fxy - gxy[i] * f[i]; Axy_rhs[i] = fxy[i] - gxy[i] * f[i];
Axz_rhs[i] = l_fxz - gxz[i] * f[i]; Axz_rhs[i] = fxz[i] - gxz[i] * f[i];
Ayz_rhs[i] = l_fyz - gyz[i] * f[i]; Ayz_rhs[i] = fyz[i] - gyz[i] * f[i];
/* Now: store A_il A^l_j into fij: */ /* Now: store A_il A^l_j into fij: */
fxx[i] = fxx[i] =
@@ -890,19 +928,19 @@ int f_compute_rhs_bssn(int *ex, double &T,
f[i] * Axx_rhs[i] f[i] * Axx_rhs[i]
+ alpn1[i] * ( trK[i] * Axx[i] - TWO * fxx[i] ) + alpn1[i] * ( trK[i] * Axx[i] - TWO * fxx[i] )
+ TWO * ( Axx[i] * betaxx[i] + Axy[i] * betayx[i] + Axz[i] * betazx[i] ) + TWO * ( Axx[i] * betaxx[i] + Axy[i] * betayx[i] + Axz[i] * betazx[i] )
- F2o3 * Axx[i] * divb; - F2o3 * Axx[i] * div_beta[i];
Ayy_rhs[i] = Ayy_rhs[i] =
f[i] * Ayy_rhs[i] f[i] * Ayy_rhs[i]
+ alpn1[i] * ( trK[i] * Ayy[i] - TWO * fyy[i] ) + alpn1[i] * ( trK[i] * Ayy[i] - TWO * fyy[i] )
+ TWO * ( Axy[i] * betaxy[i] + Ayy[i] * betayy[i] + Ayz[i] * betazy[i] ) + TWO * ( Axy[i] * betaxy[i] + Ayy[i] * betayy[i] + Ayz[i] * betazy[i] )
- F2o3 * Ayy[i] * divb; - F2o3 * Ayy[i] * div_beta[i];
Azz_rhs[i] = Azz_rhs[i] =
f[i] * Azz_rhs[i] f[i] * Azz_rhs[i]
+ alpn1[i] * ( trK[i] * Azz[i] - TWO * fzz[i] ) + alpn1[i] * ( trK[i] * Azz[i] - TWO * fzz[i] )
+ TWO * ( Axz[i] * betaxz[i] + Ayz[i] * betayz[i] + Azz[i] * betazz[i] ) + TWO * ( Axz[i] * betaxz[i] + Ayz[i] * betayz[i] + Azz[i] * betazz[i] )
- F2o3 * Azz[i] * divb; - F2o3 * Azz[i] * div_beta[i];
Axy_rhs[i] = Axy_rhs[i] =
f[i] * Axy_rhs[i] f[i] * Axy_rhs[i]
@@ -911,7 +949,7 @@ int f_compute_rhs_bssn(int *ex, double &T,
+ Axz[i] * betazy[i] + Axz[i] * betazy[i]
+ Ayy[i] * betayx[i] + Ayy[i] * betayx[i]
+ Ayz[i] * betazx[i] + Ayz[i] * betazx[i]
+ F1o3 * Axy[i] * divb + F1o3 * Axy[i] * div_beta[i]
- Axy[i] * betazz[i]; - Axy[i] * betazz[i];
Ayz_rhs[i] = Ayz_rhs[i] =
@@ -921,7 +959,7 @@ int f_compute_rhs_bssn(int *ex, double &T,
+ Ayy[i] * betayz[i] + Ayy[i] * betayz[i]
+ Axz[i] * betaxy[i] + Axz[i] * betaxy[i]
+ Azz[i] * betazy[i] + Azz[i] * betazy[i]
+ F1o3 * Ayz[i] * divb + F1o3 * Ayz[i] * div_beta[i]
- Ayz[i] * betaxx[i]; - Ayz[i] * betaxx[i];
Axz_rhs[i] = Axz_rhs[i] =
@@ -931,7 +969,7 @@ int f_compute_rhs_bssn(int *ex, double &T,
+ Axy[i] * betayz[i] + Axy[i] * betayz[i]
+ Ayz[i] * betayx[i] + Ayz[i] * betayx[i]
+ Azz[i] * betazx[i] + Azz[i] * betazx[i]
+ F1o3 * Axz[i] * divb + F1o3 * Axz[i] * div_beta[i]
- Axz[i] * betayy[i]; - Axz[i] * betayy[i];
/* Compute trace of S_ij */ /* Compute trace of S_ij */
@@ -1062,31 +1100,58 @@ int f_compute_rhs_bssn(int *ex, double &T,
dtSfz_rhs[i] = Gamz_rhs[i] - reta[i] * dtSfz[i]; dtSfz_rhs[i] = Gamz_rhs[i] - reta[i] * dtSfz[i];
#endif #endif
} }
// advection + KO dissipation with shared symmetry buffer // 26ms //
lopsided_kodis(ex,X,Y,Z,dxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA,eps); lopsided(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA);
lopsided_kodis(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,eps); lopsided(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS);
lopsided_kodis(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,eps); lopsided(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA);
lopsided_kodis(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS,eps); lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS);
lopsided_kodis(ex,X,Y,Z,dyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS,eps); lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS);
lopsided_kodis(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,eps); lopsided(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA);
lopsided_kodis(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA,eps); lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA);
lopsided_kodis(ex,X,Y,Z,dzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS,eps); lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS);
lopsided_kodis(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS,eps); lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS);
lopsided_kodis(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,eps); lopsided(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS);
lopsided_kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA,eps); lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA);
lopsided_kodis(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA,eps); lopsided(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA);
lopsided_kodis(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA,eps); lopsided(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA);
lopsided_kodis(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS,eps); lopsided(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS);
lopsided_kodis(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS,eps); lopsided(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS);
lopsided_kodis(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS,eps); lopsided(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS);
// 20ms //
if(eps>0){
kodis(ex,X,Y,Z,chi,chi_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,trK,trK_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,dxx,gxx_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,gxy,gxy_rhs,AAS,Symmetry,eps);
kodis(ex,X,Y,Z,gxz,gxz_rhs,ASA,Symmetry,eps);
kodis(ex,X,Y,Z,dyy,gyy_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,gyz,gyz_rhs,SAA,Symmetry,eps);
kodis(ex,X,Y,Z,dzz,gzz_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,Axx,Axx_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,SSA,Symmetry,eps);
kodis(ex,X,Y,Z,Axy,Axy_rhs,AAS,Symmetry,eps);
kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,SAS,Symmetry,eps);
kodis(ex,X,Y,Z,Axz,Axz_rhs,ASA,Symmetry,eps);
kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,ASS,Symmetry,eps);
kodis(ex,X,Y,Z,Ayy,Ayy_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,betaz,betaz_rhs,SSA,Symmetry,eps);
kodis(ex,X,Y,Z,Ayz,Ayz_rhs,SAA,Symmetry,eps);
kodis(ex,X,Y,Z,betay,betay_rhs,SAS,Symmetry,eps);
kodis(ex,X,Y,Z,Azz,Azz_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,betax,betax_rhs,ASS,Symmetry,eps);
kodis(ex,X,Y,Z,Gamx,Gamx_rhs,ASS,Symmetry,eps);
kodis(ex,X,Y,Z,Lap,Lap_rhs,SSS,Symmetry,eps);
kodis(ex,X,Y,Z,Gamy,Gamy_rhs,SAS,Symmetry,eps);
kodis(ex,X,Y,Z,Gamz,Gamz_rhs,SSA,Symmetry,eps);
}
// 2ms // // 2ms //
if(co==0){ if(co==0){
for (int i=0;i<all;i+=1) { for (int i=0;i<all;i+=1) {
@@ -1139,59 +1204,59 @@ int f_compute_rhs_bssn(int *ex, double &T,
fderivs(ex,Ayy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0); fderivs(ex,Ayy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0);
fderivs(ex,Ayz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,0); fderivs(ex,Ayz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,0);
fderivs(ex,Azz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0); fderivs(ex,Azz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0);
// 7ms // }
for (int i=0;i<all;i+=1) { // 7ms //
gxxx[i] = gxxx[i] - ( Gamxxx[i] * Axx[i] + Gamyxx[i] * Axy[i] + Gamzxx[i] * Axz[i] for (int i=0;i<all;i+=1) {
+ Gamxxx[i] * Axx[i] + Gamyxx[i] * Axy[i] + Gamzxx[i] * Axz[i]) - chix[i]*Axx[i]/chin1[i]; gxxx[i] = gxxx[i] - ( Gamxxx[i] * Axx[i] + Gamyxx[i] * Axy[i] + Gamzxx[i] * Axz[i]
gxyx[i] = gxyx[i] - ( Gamxxy[i] * Axx[i] + Gamyxy[i] * Axy[i] + Gamzxy[i] * Axz[i] + Gamxxx[i] * Axx[i] + Gamyxx[i] * Axy[i] + Gamzxx[i] * Axz[i]) - chix[i]*Axx[i]/chin1[i];
+ Gamxxx[i] * Axy[i] + Gamyxx[i] * Ayy[i] + Gamzxx[i] * Ayz[i]) - chix[i]*Axy[i]/chin1[i]; gxyx[i] = gxyx[i] - ( Gamxxy[i] * Axx[i] + Gamyxy[i] * Axy[i] + Gamzxy[i] * Axz[i]
gxzx[i] = gxzx[i] - ( Gamxxz[i] * Axx[i] + Gamyxz[i] * Axy[i] + Gamzxz[i] * Axz[i] + Gamxxx[i] * Axy[i] + Gamyxx[i] * Ayy[i] + Gamzxx[i] * Ayz[i]) - chix[i]*Axy[i]/chin1[i];
+ Gamxxx[i] * Axz[i] + Gamyxx[i] * Ayz[i] + Gamzxx[i] * Azz[i]) - chix[i]*Axz[i]/chin1[i]; gxzx[i] = gxzx[i] - ( Gamxxz[i] * Axx[i] + Gamyxz[i] * Axy[i] + Gamzxz[i] * Axz[i]
gyyx[i] = gyyx[i] - ( Gamxxy[i] * Axy[i] + Gamyxy[i] * Ayy[i] + Gamzxy[i] * Ayz[i] + Gamxxx[i] * Axz[i] + Gamyxx[i] * Ayz[i] + Gamzxx[i] * Azz[i]) - chix[i]*Axz[i]/chin1[i];
+ Gamxxy[i] * Axy[i] + Gamyxy[i] * Ayy[i] + Gamzxy[i] * Ayz[i]) - chix[i]*Ayy[i]/chin1[i]; gyyx[i] = gyyx[i] - ( Gamxxy[i] * Axy[i] + Gamyxy[i] * Ayy[i] + Gamzxy[i] * Ayz[i]
gyzx[i] = gyzx[i] - ( Gamxxz[i] * Axy[i] + Gamyxz[i] * Ayy[i] + Gamzxz[i] * Ayz[i] + Gamxxy[i] * Axy[i] + Gamyxy[i] * Ayy[i] + Gamzxy[i] * Ayz[i]) - chix[i]*Ayy[i]/chin1[i];
+ Gamxxy[i] * Axz[i] + Gamyxy[i] * Ayz[i] + Gamzxy[i] * Azz[i]) - chix[i]*Ayz[i]/chin1[i]; gyzx[i] = gyzx[i] - ( Gamxxz[i] * Axy[i] + Gamyxz[i] * Ayy[i] + Gamzxz[i] * Ayz[i]
gzzx[i] = gzzx[i] - ( Gamxxz[i] * Axz[i] + Gamyxz[i] * Ayz[i] + Gamzxz[i] * Azz[i] + Gamxxy[i] * Axz[i] + Gamyxy[i] * Ayz[i] + Gamzxy[i] * Azz[i]) - chix[i]*Ayz[i]/chin1[i];
+ Gamxxz[i] * Axz[i] + Gamyxz[i] * Ayz[i] + Gamzxz[i] * Azz[i]) - chix[i]*Azz[i]/chin1[i]; gzzx[i] = gzzx[i] - ( Gamxxz[i] * Axz[i] + Gamyxz[i] * Ayz[i] + Gamzxz[i] * Azz[i]
gxxy[i] = gxxy[i] - ( Gamxxy[i] * Axx[i] + Gamyxy[i] * Axy[i] + Gamzxy[i] * Axz[i] + Gamxxz[i] * Axz[i] + Gamyxz[i] * Ayz[i] + Gamzxz[i] * Azz[i]) - chix[i]*Azz[i]/chin1[i];
+ Gamxxy[i] * Axx[i] + Gamyxy[i] * Axy[i] + Gamzxy[i] * Axz[i]) - chiy[i]*Axx[i]/chin1[i]; gxxy[i] = gxxy[i] - ( Gamxxy[i] * Axx[i] + Gamyxy[i] * Axy[i] + Gamzxy[i] * Axz[i]
gxyy[i] = gxyy[i] - ( Gamxyy[i] * Axx[i] + Gamyyy[i] * Axy[i] + Gamzyy[i] * Axz[i] + Gamxxy[i] * Axx[i] + Gamyxy[i] * Axy[i] + Gamzxy[i] * Axz[i]) - chiy[i]*Axx[i]/chin1[i];
+ Gamxxy[i] * Axy[i] + Gamyxy[i] * Ayy[i] + Gamzxy[i] * Ayz[i]) - chiy[i]*Axy[i]/chin1[i]; gxyy[i] = gxyy[i] - ( Gamxyy[i] * Axx[i] + Gamyyy[i] * Axy[i] + Gamzyy[i] * Axz[i]
gxzy[i] = gxzy[i] - ( Gamxyz[i] * Axx[i] + Gamyyz[i] * Axy[i] + Gamzyz[i] * Axz[i] + Gamxxy[i] * Axy[i] + Gamyxy[i] * Ayy[i] + Gamzxy[i] * Ayz[i]) - chiy[i]*Axy[i]/chin1[i];
+ Gamxxy[i] * Axz[i] + Gamyxy[i] * Ayz[i] + Gamzxy[i] * Azz[i]) - chiy[i]*Axz[i]/chin1[i]; gxzy[i] = gxzy[i] - ( Gamxyz[i] * Axx[i] + Gamyyz[i] * Axy[i] + Gamzyz[i] * Axz[i]
gyyy[i] = gyyy[i] - ( Gamxyy[i] * Axy[i] + Gamyyy[i] * Ayy[i] + Gamzyy[i] * Ayz[i] + Gamxxy[i] * Axz[i] + Gamyxy[i] * Ayz[i] + Gamzxy[i] * Azz[i]) - chiy[i]*Axz[i]/chin1[i];
+ Gamxyy[i] * Axy[i] + Gamyyy[i] * Ayy[i] + Gamzyy[i] * Ayz[i]) - chiy[i]*Ayy[i]/chin1[i]; gyyy[i] = gyyy[i] - ( Gamxyy[i] * Axy[i] + Gamyyy[i] * Ayy[i] + Gamzyy[i] * Ayz[i]
gyzy[i] = gyzy[i] - ( Gamxyz[i] * Axy[i] + Gamyyz[i] * Ayy[i] + Gamzyz[i] * Ayz[i] + Gamxyy[i] * Axy[i] + Gamyyy[i] * Ayy[i] + Gamzyy[i] * Ayz[i]) - chiy[i]*Ayy[i]/chin1[i];
+ Gamxyy[i] * Axz[i] + Gamyyy[i] * Ayz[i] + Gamzyy[i] * Azz[i]) - chiy[i]*Ayz[i]/chin1[i]; gyzy[i] = gyzy[i] - ( Gamxyz[i] * Axy[i] + Gamyyz[i] * Ayy[i] + Gamzyz[i] * Ayz[i]
gzzy[i] = gzzy[i] - ( Gamxyz[i] * Axz[i] + Gamyyz[i] * Ayz[i] + Gamzyz[i] * Azz[i] + Gamxyy[i] * Axz[i] + Gamyyy[i] * Ayz[i] + Gamzyy[i] * Azz[i]) - chiy[i]*Ayz[i]/chin1[i];
+ Gamxyz[i] * Axz[i] + Gamyyz[i] * Ayz[i] + Gamzyz[i] * Azz[i]) - chiy[i]*Azz[i]/chin1[i]; gzzy[i] = gzzy[i] - ( Gamxyz[i] * Axz[i] + Gamyyz[i] * Ayz[i] + Gamzyz[i] * Azz[i]
gxxz[i] = gxxz[i] - ( Gamxxz[i] * Axx[i] + Gamyxz[i] * Axy[i] + Gamzxz[i] * Axz[i] + Gamxyz[i] * Axz[i] + Gamyyz[i] * Ayz[i] + Gamzyz[i] * Azz[i]) - chiy[i]*Azz[i]/chin1[i];
+ Gamxxz[i] * Axx[i] + Gamyxz[i] * Axy[i] + Gamzxz[i] * Axz[i]) - chiz[i]*Axx[i]/chin1[i]; gxxz[i] = gxxz[i] - ( Gamxxz[i] * Axx[i] + Gamyxz[i] * Axy[i] + Gamzxz[i] * Axz[i]
gxyz[i] = gxyz[i] - ( Gamxyz[i] * Axx[i] + Gamyyz[i] * Axy[i] + Gamzyz[i] * Axz[i] + Gamxxz[i] * Axx[i] + Gamyxz[i] * Axy[i] + Gamzxz[i] * Axz[i]) - chiz[i]*Axx[i]/chin1[i];
+ Gamxxz[i] * Axy[i] + Gamyxz[i] * Ayy[i] + Gamzxz[i] * Ayz[i]) - chiz[i]*Axy[i]/chin1[i]; gxyz[i] = gxyz[i] - ( Gamxyz[i] * Axx[i] + Gamyyz[i] * Axy[i] + Gamzyz[i] * Axz[i]
gxzz[i] = gxzz[i] - ( Gamxzz[i] * Axx[i] + Gamyzz[i] * Axy[i] + Gamzzz[i] * Axz[i] + Gamxxz[i] * Axy[i] + Gamyxz[i] * Ayy[i] + Gamzxz[i] * Ayz[i]) - chiz[i]*Axy[i]/chin1[i];
+ Gamxxz[i] * Axz[i] + Gamyxz[i] * Ayz[i] + Gamzxz[i] * Azz[i]) - chiz[i]*Axz[i]/chin1[i]; gxzz[i] = gxzz[i] - ( Gamxzz[i] * Axx[i] + Gamyzz[i] * Axy[i] + Gamzzz[i] * Axz[i]
gyyz[i] = gyyz[i] - ( Gamxyz[i] * Axy[i] + Gamyyz[i] * Ayy[i] + Gamzyz[i] * Ayz[i] + Gamxxz[i] * Axz[i] + Gamyxz[i] * Ayz[i] + Gamzxz[i] * Azz[i]) - chiz[i]*Axz[i]/chin1[i];
+ Gamxyz[i] * Axy[i] + Gamyyz[i] * Ayy[i] + Gamzyz[i] * Ayz[i]) - chiz[i]*Ayy[i]/chin1[i]; gyyz[i] = gyyz[i] - ( Gamxyz[i] * Axy[i] + Gamyyz[i] * Ayy[i] + Gamzyz[i] * Ayz[i]
gyzz[i] = gyzz[i] - ( Gamxzz[i] * Axy[i] + Gamyzz[i] * Ayy[i] + Gamzzz[i] * Ayz[i] + Gamxyz[i] * Axy[i] + Gamyyz[i] * Ayy[i] + Gamzyz[i] * Ayz[i]) - chiz[i]*Ayy[i]/chin1[i];
+ Gamxyz[i] * Axz[i] + Gamyyz[i] * Ayz[i] + Gamzyz[i] * Azz[i]) - chiz[i]*Ayz[i]/chin1[i]; gyzz[i] = gyzz[i] - ( Gamxzz[i] * Axy[i] + Gamyzz[i] * Ayy[i] + Gamzzz[i] * Ayz[i]
gzzz[i] = gzzz[i] - ( Gamxzz[i] * Axz[i] + Gamyzz[i] * Ayz[i] + Gamzzz[i] * Azz[i] + Gamxyz[i] * Axz[i] + Gamyyz[i] * Ayz[i] + Gamzyz[i] * Azz[i]) - chiz[i]*Ayz[i]/chin1[i];
+ Gamxzz[i] * Axz[i] + Gamyzz[i] * Ayz[i] + Gamzzz[i] * Azz[i]) - chiz[i]*Azz[i]/chin1[i]; gzzz[i] = gzzz[i] - ( Gamxzz[i] * Axz[i] + Gamyzz[i] * Ayz[i] + Gamzzz[i] * Azz[i]
+ Gamxzz[i] * Axz[i] + Gamyzz[i] * Ayz[i] + Gamzzz[i] * Azz[i]) - chiz[i]*Azz[i]/chin1[i];
movx_Res[i] = gupxx[i]*gxxx[i] + gupyy[i]*gxyy[i] + gupzz[i]*gxzz[i] movx_Res[i] = gupxx[i]*gxxx[i] + gupyy[i]*gxyy[i] + gupzz[i]*gxzz[i]
+ gupxy[i]*gxyx[i] + gupxz[i]*gxzx[i] + gupyz[i]*gxzy[i] + gupxy[i]*gxyx[i] + gupxz[i]*gxzx[i] + gupyz[i]*gxzy[i]
+ gupxy[i]*gxxy[i] + gupxz[i]*gxxz[i] + gupyz[i]*gxyz[i]; + gupxy[i]*gxxy[i] + gupxz[i]*gxxz[i] + gupyz[i]*gxyz[i];
movy_Res[i] = gupxx[i]*gxyx[i] + gupyy[i]*gyyy[i] + gupzz[i]*gyzz[i] movy_Res[i] = gupxx[i]*gxyx[i] + gupyy[i]*gyyy[i] + gupzz[i]*gyzz[i]
+ gupxy[i]*gyyx[i] + gupxz[i]*gyzx[i] + gupyz[i]*gyzy[i] + gupxy[i]*gyyx[i] + gupxz[i]*gyzx[i] + gupyz[i]*gyzy[i]
+ gupxy[i]*gxyy[i] + gupxz[i]*gxyz[i] + gupyz[i]*gyyz[i]; + gupxy[i]*gxyy[i] + gupxz[i]*gxyz[i] + gupyz[i]*gyyz[i];
movz_Res[i] = gupxx[i]*gxzx[i] + gupyy[i]*gyzy[i] + gupzz[i]*gzzz[i] movz_Res[i] = gupxx[i]*gxzx[i] + gupyy[i]*gyzy[i] + gupzz[i]*gzzz[i]
+ gupxy[i]*gyzx[i] + gupxz[i]*gzzx[i] + gupyz[i]*gzzy[i] + gupxy[i]*gyzx[i] + gupxz[i]*gzzx[i] + gupyz[i]*gzzy[i]
+ gupxy[i]*gxzy[i] + gupxz[i]*gxzz[i] + gupyz[i]*gyzz[i]; + gupxy[i]*gxzy[i] + gupxz[i]*gxzz[i] + gupyz[i]*gyzz[i];
movx_Res[i] = movx_Res[i] - F2o3*Kx[i] - F8*PI*Sx[i]; movx_Res[i] = movx_Res[i] - F2o3*Kx[i] - F8*PI*Sx[i];
movy_Res[i] = movy_Res[i] - F2o3*Ky[i] - F8*PI*Sy[i]; movy_Res[i] = movy_Res[i] - F2o3*Ky[i] - F8*PI*Sy[i];
movz_Res[i] = movz_Res[i] - F2o3*Kz[i] - F8*PI*Sz[i]; movz_Res[i] = movz_Res[i] - F2o3*Kz[i] - F8*PI*Sz[i];
}
} }

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,36 @@
#ifndef BSSN_RHS_CUDA_H
#define BSSN_RHS_CUDA_H
#ifdef __cplusplus
extern "C" {
#endif
int f_compute_rhs_bssn(int *ex, double &T,
double *X, double *Y, double *Z,
double *chi, double *trK,
double *dxx, double *gxy, double *gxz, double *dyy, double *gyz, double *dzz,
double *Axx, double *Axy, double *Axz, double *Ayy, double *Ayz, double *Azz,
double *Gamx, double *Gamy, double *Gamz,
double *Lap, double *betax, double *betay, double *betaz,
double *dtSfx, double *dtSfy, double *dtSfz,
double *chi_rhs, double *trK_rhs,
double *gxx_rhs, double *gxy_rhs, double *gxz_rhs, double *gyy_rhs, double *gyz_rhs, double *gzz_rhs,
double *Axx_rhs, double *Axy_rhs, double *Axz_rhs, double *Ayy_rhs, double *Ayz_rhs, double *Azz_rhs,
double *Gamx_rhs, double *Gamy_rhs, double *Gamz_rhs,
double *Lap_rhs, double *betax_rhs, double *betay_rhs, double *betaz_rhs,
double *dtSfx_rhs, double *dtSfy_rhs, double *dtSfz_rhs,
double *rho, double *Sx, double *Sy, double *Sz,
double *Sxx, double *Sxy, double *Sxz, double *Syy, double *Syz, double *Szz,
double *Gamxxx, double *Gamxxy, double *Gamxxz, double *Gamxyy, double *Gamxyz, double *Gamxzz,
double *Gamyxx, double *Gamyxy, double *Gamyxz, double *Gamyyy, double *Gamyyz, double *Gamyzz,
double *Gamzxx, double *Gamzxy, double *Gamzxz, double *Gamzyy, double *Gamzyz, double *Gamzzz,
double *Rxx, double *Rxy, double *Rxz, double *Ryy, double *Ryz, double *Rzz,
double *ham_Res, double *movx_Res, double *movy_Res, double *movz_Res,
double *Gmx_Res, double *Gmy_Res, double *Gmz_Res,
int &Symmetry, int &Lev, double &eps, int &co);
#ifdef __cplusplus
}
#endif
#endif

View File

@@ -33,7 +33,7 @@
real*8 :: dX,dY,dZ real*8 :: dX,dY,dZ
real*8,dimension(0:ex(1),0:ex(2),0:ex(3)) :: fh real*8,dimension(0:ex(1),0:ex(2),0:ex(3)) :: fh
real*8, dimension(3) :: SoA real*8, dimension(3) :: SoA
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: d2dx,d2dy,d2dz real*8 :: d2dx,d2dy,d2dz
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2 integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F60=6.d1 real*8, parameter :: ZEO=0.d0,ONE=1.d0, F60=6.d1
@@ -137,7 +137,7 @@
real*8 :: dX real*8 :: dX
real*8,dimension(0:ex(1),0:ex(2),0:ex(3)) :: fh real*8,dimension(0:ex(1),0:ex(2),0:ex(3)) :: fh
real*8, dimension(3) :: SoA real*8, dimension(3) :: SoA
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: d2dx real*8 :: d2dx
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2 integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F60=6.d1 real*8, parameter :: ZEO=0.d0,ONE=1.d0, F60=6.d1
@@ -1512,9 +1512,8 @@
real*8 :: dX,dY,dZ real*8 :: dX,dY,dZ
real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh
real*8, dimension(3) :: SoA real*8, dimension(3) :: SoA
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
integer :: i_core_min,i_core_max,j_core_min,j_core_max,k_core_min,k_core_max real*8 :: Sdxdx,Sdydy,Sdzdz,Fdxdx,Fdydy,Fdzdz
real*8 :: Sdxdx,Sdydy,Sdzdz,Fdxdx,Fdydy,Fdzdz
real*8 :: Sdxdy,Sdxdz,Sdydz,Fdxdy,Fdxdz,Fdydz real*8 :: Sdxdy,Sdxdz,Sdydz,Fdxdy,Fdxdz,Fdydz
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2 integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8, parameter :: ZEO=0.d0, ONE=1.d0, TWO=2.d0, F1o4=2.5d-1, F9=9.d0, F45=4.5d1 real*8, parameter :: ZEO=0.d0, ONE=1.d0, TWO=2.d0, F1o4=2.5d-1, F9=9.d0, F45=4.5d1
@@ -1561,55 +1560,17 @@
fxx = ZEO fxx = ZEO
fyy = ZEO fyy = ZEO
fzz = ZEO fzz = ZEO
fxy = ZEO fxy = ZEO
fxz = ZEO fxz = ZEO
fyz = ZEO fyz = ZEO
i_core_min = max(1, imin+2) do k=1,ex(3)
i_core_max = min(ex(1), imax-2) do j=1,ex(2)
j_core_min = max(1, jmin+2) do i=1,ex(1)
j_core_max = min(ex(2), jmax-2) !~~~~~~ fxx
k_core_min = max(1, kmin+2) if(i+2 <= imax .and. i-2 >= imin)then
k_core_max = min(ex(3), kmax-2) !
if(i_core_min <= i_core_max .and. j_core_min <= j_core_max .and. k_core_min <= k_core_max)then
do k=k_core_min,k_core_max
do j=j_core_min,j_core_max
do i=i_core_min,i_core_max
! interior points always use 4th-order stencils without branch checks
fxx(i,j,k) = Fdxdx*(-fh(i-2,j,k)+F16*fh(i-1,j,k)-F30*fh(i,j,k) &
-fh(i+2,j,k)+F16*fh(i+1,j,k) )
fyy(i,j,k) = Fdydy*(-fh(i,j-2,k)+F16*fh(i,j-1,k)-F30*fh(i,j,k) &
-fh(i,j+2,k)+F16*fh(i,j+1,k) )
fzz(i,j,k) = Fdzdz*(-fh(i,j,k-2)+F16*fh(i,j,k-1)-F30*fh(i,j,k) &
-fh(i,j,k+2)+F16*fh(i,j,k+1) )
fxy(i,j,k) = Fdxdy*( (fh(i-2,j-2,k)-F8*fh(i-1,j-2,k)+F8*fh(i+1,j-2,k)-fh(i+2,j-2,k)) &
-F8 *(fh(i-2,j-1,k)-F8*fh(i-1,j-1,k)+F8*fh(i+1,j-1,k)-fh(i+2,j-1,k)) &
+F8 *(fh(i-2,j+1,k)-F8*fh(i-1,j+1,k)+F8*fh(i+1,j+1,k)-fh(i+2,j+1,k)) &
- (fh(i-2,j+2,k)-F8*fh(i-1,j+2,k)+F8*fh(i+1,j+2,k)-fh(i+2,j+2,k)))
fxz(i,j,k) = Fdxdz*( (fh(i-2,j,k-2)-F8*fh(i-1,j,k-2)+F8*fh(i+1,j,k-2)-fh(i+2,j,k-2)) &
-F8 *(fh(i-2,j,k-1)-F8*fh(i-1,j,k-1)+F8*fh(i+1,j,k-1)-fh(i+2,j,k-1)) &
+F8 *(fh(i-2,j,k+1)-F8*fh(i-1,j,k+1)+F8*fh(i+1,j,k+1)-fh(i+2,j,k+1)) &
- (fh(i-2,j,k+2)-F8*fh(i-1,j,k+2)+F8*fh(i+1,j,k+2)-fh(i+2,j,k+2)))
fyz(i,j,k) = Fdydz*( (fh(i,j-2,k-2)-F8*fh(i,j-1,k-2)+F8*fh(i,j+1,k-2)-fh(i,j+2,k-2)) &
-F8 *(fh(i,j-2,k-1)-F8*fh(i,j-1,k-1)+F8*fh(i,j+1,k-1)-fh(i,j+2,k-1)) &
+F8 *(fh(i,j-2,k+1)-F8*fh(i,j-1,k+1)+F8*fh(i,j+1,k+1)-fh(i,j+2,k+1)) &
- (fh(i,j-2,k+2)-F8*fh(i,j-1,k+2)+F8*fh(i,j+1,k+2)-fh(i,j+2,k+2)))
enddo
enddo
enddo
endif
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i>=i_core_min .and. i<=i_core_max .and. &
j>=j_core_min .and. j<=j_core_max .and. &
k>=k_core_min .and. k<=k_core_max) cycle
!~~~~~~ fxx
if(i+2 <= imax .and. i-2 >= imin)then
!
! - f(i-2) + 16 f(i-1) - 30 f(i) + 16 f(i+1) - f(i+2) ! - f(i-2) + 16 f(i-1) - 30 f(i) + 16 f(i+1) - f(i+2)
! fxx(i) = ---------------------------------------------------------- ! fxx(i) = ----------------------------------------------------------
! 12 dx^2 ! 12 dx^2

View File

@@ -81,63 +81,26 @@ void fderivs(const int ex[3],
} }
/* /*
* 两段式: * Fortran loops:
* 1) 先在二阶可用区域计算二阶模板 * do k=1,ex3-1
* 2) 再在高阶可用区域覆盖为四阶模板 * do j=1,ex2-1
* do i=1,ex1-1
* *
* 与原 if/elseif 逻辑等价,但减少逐点分支判断。 * C: k0=0..ex3-2, j0=0..ex2-2, i0=0..ex1-2
*/ */
const int i2_lo = (iminF > 0) ? iminF : 0; for (int k0 = 0; k0 <= ex3 - 2; ++k0) {
const int j2_lo = (jminF > 0) ? jminF : 0; const int kF = k0 + 1;
const int k2_lo = (kminF > 0) ? kminF : 0; for (int j0 = 0; j0 <= ex2 - 2; ++j0) {
const int i2_hi = ex1 - 2; const int jF = j0 + 1;
const int j2_hi = ex2 - 2; for (int i0 = 0; i0 <= ex1 - 2; ++i0) {
const int k2_hi = ex3 - 2; const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
const int i4_lo = (iminF + 1 > 0) ? (iminF + 1) : 0;
const int j4_lo = (jminF + 1 > 0) ? (jminF + 1) : 0;
const int k4_lo = (kminF + 1 > 0) ? (kminF + 1) : 0;
const int i4_hi = ex1 - 3;
const int j4_hi = ex2 - 3;
const int k4_hi = ex3 - 3;
if (i2_lo <= i2_hi && j2_lo <= j2_hi && k2_lo <= k2_hi) {
for (int k0 = k2_lo; k0 <= k2_hi; ++k0) {
const int kF = k0 + 1;
for (int j0 = j2_lo; j0 <= j2_hi; ++j0) {
const int jF = j0 + 1;
for (int i0 = i2_lo; i0 <= i2_hi; ++i0) {
const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
fx[p] = d2dx * (
-fh[idx_fh_F_ord2(iF - 1, jF, kF, ex)] +
fh[idx_fh_F_ord2(iF + 1, jF, kF, ex)]
);
fy[p] = d2dy * (
-fh[idx_fh_F_ord2(iF, jF - 1, kF, ex)] +
fh[idx_fh_F_ord2(iF, jF + 1, kF, ex)]
);
fz[p] = d2dz * (
-fh[idx_fh_F_ord2(iF, jF, kF - 1, ex)] +
fh[idx_fh_F_ord2(iF, jF, kF + 1, ex)]
);
}
}
}
}
if (i4_lo <= i4_hi && j4_lo <= j4_hi && k4_lo <= k4_hi) {
for (int k0 = k4_lo; k0 <= k4_hi; ++k0) {
const int kF = k0 + 1;
for (int j0 = j4_lo; j0 <= j4_hi; ++j0) {
const int jF = j0 + 1;
for (int i0 = i4_lo; i0 <= i4_hi; ++i0) {
const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
// if(i+2 <= imax .and. i-2 >= imin ... ) (全是 Fortran 索引)
if ((iF + 2) <= imaxF && (iF - 2) >= iminF &&
(jF + 2) <= jmaxF && (jF - 2) >= jminF &&
(kF + 2) <= kmaxF && (kF - 2) >= kminF)
{
fx[p] = d12dx * ( fx[p] = d12dx * (
fh[idx_fh_F_ord2(iF - 2, jF, kF, ex)] - fh[idx_fh_F_ord2(iF - 2, jF, kF, ex)] -
EIT * fh[idx_fh_F_ord2(iF - 1, jF, kF, ex)] + EIT * fh[idx_fh_F_ord2(iF - 1, jF, kF, ex)] +
@@ -159,9 +122,29 @@ void fderivs(const int ex[3],
fh[idx_fh_F_ord2(iF, jF, kF + 2, ex)] fh[idx_fh_F_ord2(iF, jF, kF + 2, ex)]
); );
} }
// elseif(i+1 <= imax .and. i-1 >= imin ...)
else if ((iF + 1) <= imaxF && (iF - 1) >= iminF &&
(jF + 1) <= jmaxF && (jF - 1) >= jminF &&
(kF + 1) <= kmaxF && (kF - 1) >= kminF)
{
fx[p] = d2dx * (
-fh[idx_fh_F_ord2(iF - 1, jF, kF, ex)] +
fh[idx_fh_F_ord2(iF + 1, jF, kF, ex)]
);
fy[p] = d2dy * (
-fh[idx_fh_F_ord2(iF, jF - 1, kF, ex)] +
fh[idx_fh_F_ord2(iF, jF + 1, kF, ex)]
);
fz[p] = d2dz * (
-fh[idx_fh_F_ord2(iF, jF, kF - 1, ex)] +
fh[idx_fh_F_ord2(iF, jF, kF + 1, ex)]
);
}
} }
} }
} }
// free(fh); // free(fh);
} }

View File

@@ -1327,6 +1327,35 @@ end subroutine d2dump
return return
end subroutine polint end subroutine polint
subroutine polint0(xa, ya, y, ordn)
! Lagrange interpolation at x=0, O(n) direct formula
implicit none
integer, intent(in) :: ordn
real*8, dimension(ordn), intent(in) :: xa, ya
real*8, intent(out) :: y
integer :: j, k
real*8 :: wj
y = 0.d0
do j = 1, ordn
wj = 1.d0
do k = 1, ordn
if (k .ne. j) then
wj = wj * xa(k) / (xa(k) - xa(j))
endif
enddo
y = y + wj * ya(j)
enddo
return
end subroutine polint0
!------------------------------------------------------------------------------
!
! interpolation in 2 dimensions, follow yx order
!
!------------------------------------------------------------------------------
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
! Compute Lagrange interpolation basis weights for one target point. ! Compute Lagrange interpolation basis weights for one target point.
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------

View File

@@ -1,5 +1,3 @@
/* 本头文件由自订profile框架自动生成并非人工硬编码针对Case优化 */
/* 更新负载均衡问题已经通过优化插值函数解决此profile静态均衡方案已弃用本头文件现在未参与编译 */
/* Auto-generated from interp_lb_profile.bin — do not edit */ /* Auto-generated from interp_lb_profile.bin — do not edit */
#ifndef INTERP_LB_PROFILE_DATA_H #ifndef INTERP_LB_PROFILE_DATA_H
#define INTERP_LB_PROFILE_DATA_H #define INTERP_LB_PROFILE_DATA_H

View File

@@ -63,28 +63,19 @@ void kodis(const int ex[3],
* C: k0=0..ex3-1, j0=0..ex2-1, i0=0..ex1-1 * C: k0=0..ex3-1, j0=0..ex2-1, i0=0..ex1-1
* 并定义 Fortran index: iF=i0+1, ... * 并定义 Fortran index: iF=i0+1, ...
*/ */
// 收紧循环范围:只遍历满足 iF±3/jF±3/kF±3 条件的内部点 for (int k0 = 0; k0 < ex3; ++k0) {
// iF-3 >= iminF => iF >= iminF+3 => i0 >= iminF+2 (因为 iF=i0+1)
// iF+3 <= imaxF => iF <= imaxF-3 => i0 <= imaxF-4
const int i0_lo = (iminF + 2 > 0) ? iminF + 2 : 0;
const int j0_lo = (jminF + 2 > 0) ? jminF + 2 : 0;
const int k0_lo = (kminF + 2 > 0) ? kminF + 2 : 0;
const int i0_hi = imaxF - 4; // inclusive
const int j0_hi = jmaxF - 4;
const int k0_hi = kmaxF - 4;
if (i0_lo > i0_hi || j0_lo > j0_hi || k0_lo > k0_hi) {
free(fh);
return;
}
for (int k0 = k0_lo; k0 <= k0_hi; ++k0) {
const int kF = k0 + 1; const int kF = k0 + 1;
for (int j0 = j0_lo; j0 <= j0_hi; ++j0) { for (int j0 = 0; j0 < ex2; ++j0) {
const int jF = j0 + 1; const int jF = j0 + 1;
for (int i0 = i0_lo; i0 <= i0_hi; ++i0) { for (int i0 = 0; i0 < ex1; ++i0) {
const int iF = i0 + 1; const int iF = i0 + 1;
// Fortran if 条件:
// i-3 >= imin .and. i+3 <= imax 等(都是 Fortran 索引)
if ((iF - 3) >= iminF && (iF + 3) <= imaxF &&
(jF - 3) >= jminF && (jF + 3) <= jmaxF &&
(kF - 3) >= kminF && (kF + 3) <= kmaxF)
{
const size_t p = idx_ex(i0, j0, k0, ex); const size_t p = idx_ex(i0, j0, k0, ex);
// 三个方向各一份同型的 7 点组合(实际上是对称的 6th-order dissipation/filter 核) // 三个方向各一份同型的 7 点组合(实际上是对称的 6th-order dissipation/filter 核)
@@ -109,6 +100,7 @@ void kodis(const int ex[3],
// Fortran: // Fortran:
// f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof*(Dx_term + Dy_term + Dz_term) // f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof*(Dx_term + Dy_term + Dz_term)
f_rhs[p] += (eps / cof) * (Dx_term + Dy_term + Dz_term); f_rhs[p] += (eps / cof) * (Dx_term + Dy_term + Dz_term);
}
} }
} }
} }

View File

@@ -1,248 +0,0 @@
#include "tool.h"
/*
* Combined advection (lopsided) + KO dissipation (kodis).
* Uses one shared symmetry_bd buffer per call.
*/
void lopsided_kodis(const int ex[3],
const double *X, const double *Y, const double *Z,
const double *f, double *f_rhs,
const double *Sfx, const double *Sfy, const double *Sfz,
int Symmetry, const double SoA[3], double eps)
{
const double ZEO = 0.0, ONE = 1.0, F3 = 3.0;
const double F6 = 6.0, F18 = 18.0;
const double F12 = 12.0, F10 = 10.0, EIT = 8.0;
const double SIX = 6.0, FIT = 15.0, TWT = 20.0;
const double cof = 64.0; // 2^6
const int NO_SYMM = 0, EQ_SYMM = 1;
const int ex1 = ex[0], ex2 = ex[1], ex3 = ex[2];
const double dX = X[1] - X[0];
const double dY = Y[1] - Y[0];
const double dZ = Z[1] - Z[0];
const double d12dx = ONE / F12 / dX;
const double d12dy = ONE / F12 / dY;
const double d12dz = ONE / F12 / dZ;
const int imaxF = ex1;
const int jmaxF = ex2;
const int kmaxF = ex3;
int iminF = 1, jminF = 1, kminF = 1;
if (Symmetry > NO_SYMM && fabs(Z[0]) < dZ) kminF = -2;
if (Symmetry > EQ_SYMM && fabs(X[0]) < dX) iminF = -2;
if (Symmetry > EQ_SYMM && fabs(Y[0]) < dY) jminF = -2;
// fh for Fortran-style domain (-2:ex1,-2:ex2,-2:ex3)
const size_t nx = (size_t)ex1 + 3;
const size_t ny = (size_t)ex2 + 3;
const size_t nz = (size_t)ex3 + 3;
const size_t fh_size = nx * ny * nz;
double *fh = (double*)malloc(fh_size * sizeof(double));
if (!fh) return;
symmetry_bd(3, ex, f, fh, SoA);
// Advection (same stencil logic as lopsided_c.C)
for (int k0 = 0; k0 <= ex3 - 2; ++k0) {
const int kF = k0 + 1;
for (int j0 = 0; j0 <= ex2 - 2; ++j0) {
const int jF = j0 + 1;
for (int i0 = 0; i0 <= ex1 - 2; ++i0) {
const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
const double sfx = Sfx[p];
if (sfx > ZEO) {
if (i0 <= ex1 - 4) {
f_rhs[p] += sfx * d12dx *
(-F3 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF + 2, jF, kF, ex)]
+ fh[idx_fh_F(iF + 3, jF, kF, ex)]);
} else if (i0 <= ex1 - 3) {
f_rhs[p] += sfx * d12dx *
( fh[idx_fh_F(iF - 2, jF, kF, ex)]
-EIT * fh[idx_fh_F(iF - 1, jF, kF, ex)]
+EIT * fh[idx_fh_F(iF + 1, jF, kF, ex)]
- fh[idx_fh_F(iF + 2, jF, kF, ex)]);
} else if (i0 <= ex1 - 2) {
f_rhs[p] -= sfx * d12dx *
(-F3 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF - 2, jF, kF, ex)]
+ fh[idx_fh_F(iF - 3, jF, kF, ex)]);
}
} else if (sfx < ZEO) {
if ((i0 - 2) >= iminF) {
f_rhs[p] -= sfx * d12dx *
(-F3 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF - 2, jF, kF, ex)]
+ fh[idx_fh_F(iF - 3, jF, kF, ex)]);
} else if ((i0 - 1) >= iminF) {
f_rhs[p] += sfx * d12dx *
( fh[idx_fh_F(iF - 2, jF, kF, ex)]
-EIT * fh[idx_fh_F(iF - 1, jF, kF, ex)]
+EIT * fh[idx_fh_F(iF + 1, jF, kF, ex)]
- fh[idx_fh_F(iF + 2, jF, kF, ex)]);
} else if (i0 >= iminF) {
f_rhs[p] += sfx * d12dx *
(-F3 * fh[idx_fh_F(iF - 1, jF, kF, ex)]
-F10 * fh[idx_fh_F(iF , jF, kF, ex)]
+F18 * fh[idx_fh_F(iF + 1, jF, kF, ex)]
-F6 * fh[idx_fh_F(iF + 2, jF, kF, ex)]
+ fh[idx_fh_F(iF + 3, jF, kF, ex)]);
}
}
const double sfy = Sfy[p];
if (sfy > ZEO) {
if (j0 <= ex2 - 4) {
f_rhs[p] += sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF + 2, kF, ex)]
+ fh[idx_fh_F(iF, jF + 3, kF, ex)]);
} else if (j0 <= ex2 - 3) {
f_rhs[p] += sfy * d12dy *
( fh[idx_fh_F(iF, jF - 2, kF, ex)]
-EIT * fh[idx_fh_F(iF, jF - 1, kF, ex)]
+EIT * fh[idx_fh_F(iF, jF + 1, kF, ex)]
- fh[idx_fh_F(iF, jF + 2, kF, ex)]);
} else if (j0 <= ex2 - 2) {
f_rhs[p] -= sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF - 2, kF, ex)]
+ fh[idx_fh_F(iF, jF - 3, kF, ex)]);
}
} else if (sfy < ZEO) {
if ((j0 - 2) >= jminF) {
f_rhs[p] -= sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF - 2, kF, ex)]
+ fh[idx_fh_F(iF, jF - 3, kF, ex)]);
} else if ((j0 - 1) >= jminF) {
f_rhs[p] += sfy * d12dy *
( fh[idx_fh_F(iF, jF - 2, kF, ex)]
-EIT * fh[idx_fh_F(iF, jF - 1, kF, ex)]
+EIT * fh[idx_fh_F(iF, jF + 1, kF, ex)]
- fh[idx_fh_F(iF, jF + 2, kF, ex)]);
} else if (j0 >= jminF) {
f_rhs[p] += sfy * d12dy *
(-F3 * fh[idx_fh_F(iF, jF - 1, kF, ex)]
-F10 * fh[idx_fh_F(iF, jF , kF, ex)]
+F18 * fh[idx_fh_F(iF, jF + 1, kF, ex)]
-F6 * fh[idx_fh_F(iF, jF + 2, kF, ex)]
+ fh[idx_fh_F(iF, jF + 3, kF, ex)]);
}
}
const double sfz = Sfz[p];
if (sfz > ZEO) {
if (k0 <= ex3 - 4) {
f_rhs[p] += sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF + 2, ex)]
+ fh[idx_fh_F(iF, jF, kF + 3, ex)]);
} else if (k0 <= ex3 - 3) {
f_rhs[p] += sfz * d12dz *
( fh[idx_fh_F(iF, jF, kF - 2, ex)]
-EIT * fh[idx_fh_F(iF, jF, kF - 1, ex)]
+EIT * fh[idx_fh_F(iF, jF, kF + 1, ex)]
- fh[idx_fh_F(iF, jF, kF + 2, ex)]);
} else if (k0 <= ex3 - 2) {
f_rhs[p] -= sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF - 2, ex)]
+ fh[idx_fh_F(iF, jF, kF - 3, ex)]);
}
} else if (sfz < ZEO) {
if ((k0 - 2) >= kminF) {
f_rhs[p] -= sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF - 2, ex)]
+ fh[idx_fh_F(iF, jF, kF - 3, ex)]);
} else if ((k0 - 1) >= kminF) {
f_rhs[p] += sfz * d12dz *
( fh[idx_fh_F(iF, jF, kF - 2, ex)]
-EIT * fh[idx_fh_F(iF, jF, kF - 1, ex)]
+EIT * fh[idx_fh_F(iF, jF, kF + 1, ex)]
- fh[idx_fh_F(iF, jF, kF + 2, ex)]);
} else if (k0 >= kminF) {
f_rhs[p] += sfz * d12dz *
(-F3 * fh[idx_fh_F(iF, jF, kF - 1, ex)]
-F10 * fh[idx_fh_F(iF, jF, kF , ex)]
+F18 * fh[idx_fh_F(iF, jF, kF + 1, ex)]
-F6 * fh[idx_fh_F(iF, jF, kF + 2, ex)]
+ fh[idx_fh_F(iF, jF, kF + 3, ex)]);
}
}
}
}
}
// KO dissipation (same domain restriction as kodiss_c.C)
if (eps > ZEO) {
const int i0_lo = (iminF + 2 > 0) ? iminF + 2 : 0;
const int j0_lo = (jminF + 2 > 0) ? jminF + 2 : 0;
const int k0_lo = (kminF + 2 > 0) ? kminF + 2 : 0;
const int i0_hi = imaxF - 4; // inclusive
const int j0_hi = jmaxF - 4;
const int k0_hi = kmaxF - 4;
if (!(i0_lo > i0_hi || j0_lo > j0_hi || k0_lo > k0_hi)) {
for (int k0 = k0_lo; k0 <= k0_hi; ++k0) {
const int kF = k0 + 1;
for (int j0 = j0_lo; j0 <= j0_hi; ++j0) {
const int jF = j0 + 1;
for (int i0 = i0_lo; i0 <= i0_hi; ++i0) {
const int iF = i0 + 1;
const size_t p = idx_ex(i0, j0, k0, ex);
const double Dx_term =
((fh[idx_fh_F(iF - 3, jF, kF, ex)] + fh[idx_fh_F(iF + 3, jF, kF, ex)]) -
SIX * (fh[idx_fh_F(iF - 2, jF, kF, ex)] + fh[idx_fh_F(iF + 2, jF, kF, ex)]) +
FIT * (fh[idx_fh_F(iF - 1, jF, kF, ex)] + fh[idx_fh_F(iF + 1, jF, kF, ex)]) -
TWT * fh[idx_fh_F(iF, jF, kF, ex)]) / dX;
const double Dy_term =
((fh[idx_fh_F(iF, jF - 3, kF, ex)] + fh[idx_fh_F(iF, jF + 3, kF, ex)]) -
SIX * (fh[idx_fh_F(iF, jF - 2, kF, ex)] + fh[idx_fh_F(iF, jF + 2, kF, ex)]) +
FIT * (fh[idx_fh_F(iF, jF - 1, kF, ex)] + fh[idx_fh_F(iF, jF + 1, kF, ex)]) -
TWT * fh[idx_fh_F(iF, jF, kF, ex)]) / dY;
const double Dz_term =
((fh[idx_fh_F(iF, jF, kF - 3, ex)] + fh[idx_fh_F(iF, jF, kF + 3, ex)]) -
SIX * (fh[idx_fh_F(iF, jF, kF - 2, ex)] + fh[idx_fh_F(iF, jF, kF + 2, ex)]) +
FIT * (fh[idx_fh_F(iF, jF, kF - 1, ex)] + fh[idx_fh_F(iF, jF, kF + 1, ex)]) -
TWT * fh[idx_fh_F(iF, jF, kF, ex)]) / dZ;
f_rhs[p] += (eps / cof) * (Dx_term + Dy_term + Dz_term);
}
}
}
}
}
free(fh);
}

View File

@@ -1,35 +1,33 @@
include makefile.inc include makefile.inc
## polint(ordn=6) kernel selector: ## polint(ordn=6) kernel selector:
## 1 (default): barycentric fast path ## 1 (default): barycentric fast path
## 0 : fallback to Neville path ## 0 : fallback to Neville path
POLINT6_USE_BARY ?= 1 POLINT6_USE_BARY ?= 1
POLINT6_FLAG = -DPOLINT6_USE_BARYCENTRIC=$(POLINT6_USE_BARY) POLINT6_FLAG = -DPOLINT6_USE_BARYCENTRIC=$(POLINT6_USE_BARY)
## ABE build flags selected by PGO_MODE (set in makefile.inc, default: opt) ## ABE build flags selected by PGO_MODE (set in makefile.inc, default: opt)
## make -> opt (PGO-guided, maximum performance) ## make -> opt (PGO-guided, maximum performance)
## make PGO_MODE=instrument -> instrument (Phase 1: collect fresh profile data) ## make PGO_MODE=instrument -> instrument (Phase 1: collect fresh profile data)
PROFDATA = /home/$(shell whoami)/AMSS-NCKU/pgo_profile/default.profdata PROFDATA = /home/$(shell whoami)/AMSS-NCKU/pgo_profile/default.profdata
ifeq ($(PGO_MODE),instrument) ifeq ($(PGO_MODE),instrument)
## Phase 1: instrumentation — omit -ipo/-fp-model fast=2 for faster build and numerical stability ## Phase 1: instrumentation — omit -ipo/-fp-model fast=2 for faster build and numerical stability
CXXAPPFLAGS = -O3 -xHost -fma -fprofile-instr-generate -ipo \ CXXAPPFLAGS = -O3 -xHost -fma -fprofile-instr-generate -ipo \
-Dfortran3 -Dnewc -I${MKLROOT}/include $(INTERP_LB_FLAGS) -Dfortran3 -Dnewc -I${MKLROOT}/include $(INTERP_LB_FLAGS)
f90appflags = -O3 -xHost -fma -fprofile-instr-generate -ipo \ f90appflags = -O3 -xHost -fma -fprofile-instr-generate -ipo \
-align array64byte -fpp -I${MKLROOT}/include $(POLINT6_FLAG) -align array64byte -fpp -I${MKLROOT}/include $(POLINT6_FLAG)
else else
## opt (default): maximum performance with PGO profile data -fprofile-instr-use=$(PROFDATA) \ ## opt (default): maximum performance with PGO profile data
## PGO has been turned off, now tested and found to be negative optimization CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
## INTERP_LB_FLAGS has been turned off too, now tested and found to be negative optimization -fprofile-instr-use=$(PROFDATA) \
-Dfortran3 -Dnewc -I${MKLROOT}/include $(INTERP_LB_FLAGS)
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \ -fprofile-instr-use=$(PROFDATA) \
-Dfortran3 -Dnewc -I${MKLROOT}/include $(INTERP_LB_FLAGS) -align array64byte -fpp -I${MKLROOT}/include $(POLINT6_FLAG)
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \ endif
-align array64byte -fpp -I${MKLROOT}/include $(POLINT6_FLAG)
endif
.SUFFIXES: .o .f90 .C .for .cu .SUFFIXES: .o .f90 .C .for .cu
@@ -45,6 +43,10 @@ endif
.cu.o: .cu.o:
$(Cu) $(CUDA_APP_FLAGS) -c $< -o $@ $(CUDA_LIB_PATH) $(Cu) $(CUDA_APP_FLAGS) -c $< -o $@ $(CUDA_LIB_PATH)
# CUDA rewrite of BSSN RHS (drop-in replacement for bssn_rhs_c + stencil helpers)
bssn_rhs_cuda.o: bssn_rhs_cuda.cu macrodef.h
$(Cu) $(CUDA_APP_FLAGS) -c $< -o $@ $(CUDA_LIB_PATH)
# C rewrite of BSSN RHS kernel and helpers # C rewrite of BSSN RHS kernel and helpers
bssn_rhs_c.o: bssn_rhs_c.C bssn_rhs_c.o: bssn_rhs_c.C
${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@ ${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@
@@ -58,14 +60,11 @@ fdderivs_c.o: fdderivs_c.C
kodiss_c.o: kodiss_c.C kodiss_c.o: kodiss_c.C
${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@ ${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@
lopsided_c.o: lopsided_c.C lopsided_c.o: lopsided_c.C
${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@ ${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@
lopsided_kodis_c.o: lopsided_kodis_c.C interp_lb_profile.o: interp_lb_profile.C interp_lb_profile.h
${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@ ${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@
#interp_lb_profile.o: interp_lb_profile.C interp_lb_profile.h
# ${CXX} $(CXXAPPFLAGS) -c $< $(filein) -o $@
## TwoPunctureABE uses fixed optimal flags with its own PGO profile, independent of CXXAPPFLAGS ## TwoPunctureABE uses fixed optimal flags with its own PGO profile, independent of CXXAPPFLAGS
TP_PROFDATA = /home/$(shell whoami)/AMSS-NCKU/pgo_profile/TwoPunctureABE.profdata TP_PROFDATA = /home/$(shell whoami)/AMSS-NCKU/pgo_profile/TwoPunctureABE.profdata
@@ -82,21 +81,25 @@ TwoPunctureABE.o: TwoPunctureABE.C
# Input files # Input files
## Kernel implementation switch (set USE_CXX_KERNELS=0 to fall back to Fortran) ## Kernel implementation switch (set USE_CXX_KERNELS=0 to fall back to Fortran)
ifeq ($(USE_CXX_KERNELS),0) ifeq ($(USE_CXX_KERNELS),0)
# Fortran mode: no C rewrite files; bssn_rhs.o is included via F90FILES below # Fortran mode: no C rewrite files; bssn_rhs.o is included via F90FILES below
CFILES = CFILES =
else else
# C++ mode (default): C rewrite of bssn_rhs and helper kernels # C++ mode (default): C rewrite of bssn_rhs and helper kernels
CFILES = bssn_rhs_c.o fderivs_c.o fdderivs_c.o kodiss_c.o lopsided_c.o lopsided_kodis_c.o CFILES = bssn_rhs_c.o fderivs_c.o fdderivs_c.o kodiss_c.o lopsided_c.o
endif endif
## RK4 kernel switch (independent from USE_CXX_KERNELS) # CUDA rewrite: bssn_rhs_cuda.o replaces all CFILES (stencils are built-in)
ifeq ($(USE_CXX_RK4),1) CFILES_CUDA = bssn_rhs_cuda.o
CFILES += rungekutta4_rout_c.o
RK4_F90_OBJ = ## RK4 kernel switch (independent from USE_CXX_KERNELS)
else ifeq ($(USE_CXX_RK4),1)
RK4_F90_OBJ = rungekutta4_rout.o CFILES += rungekutta4_rout_c.o
endif CFILES_CUDA += rungekutta4_rout_c.o
RK4_F90_OBJ =
else
RK4_F90_OBJ = rungekutta4_rout.o
endif
C++FILES = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o\ C++FILES = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o\
cgh.o bssn_class.o surface_integral.o ShellPatch.o\ cgh.o bssn_class.o surface_integral.o ShellPatch.o\
@@ -113,12 +116,12 @@ C++FILES_GPU = ABE.o Ansorg.o Block.o misc.o monitor.o Parallel.o MPatch.o var.o
NullShellPatch2_Evo.o \ NullShellPatch2_Evo.o \
bssn_gpu_class.o bssn_step_gpu.o bssn_macro.o writefile_f.o bssn_gpu_class.o bssn_step_gpu.o bssn_macro.o writefile_f.o
F90FILES_BASE = enforce_algebra.o fmisc.o initial_puncture.o prolongrestrict.o\ F90FILES_BASE = enforce_algebra.o fmisc.o initial_puncture.o prolongrestrict.o\
prolongrestrict_cell.o prolongrestrict_vertex.o\ prolongrestrict_cell.o prolongrestrict_vertex.o\
$(RK4_F90_OBJ) diff_new.o kodiss.o kodiss_sh.o\ $(RK4_F90_OBJ) diff_new.o kodiss.o kodiss_sh.o\
lopsidediff.o sommerfeld_rout.o getnp4.o diff_new_sh.o\ lopsidediff.o sommerfeld_rout.o getnp4.o diff_new_sh.o\
shellfunctions.o bssn_rhs_ss.o Set_Rho_ADM.o\ shellfunctions.o bssn_rhs_ss.o Set_Rho_ADM.o\
getnp4EScalar.o bssnEScalar_rhs.o bssn_constraint.o ricci_gamma.o\ getnp4EScalar.o bssnEScalar_rhs.o bssn_constraint.o ricci_gamma.o\
fadmquantites_bssn.o Z4c_rhs.o Z4c_rhs_ss.o point_diff_new_sh.o\ fadmquantites_bssn.o Z4c_rhs.o Z4c_rhs_ss.o point_diff_new_sh.o\
cpbc.o getnp4old.o NullEvol.o initial_null.o initial_maxwell.o\ cpbc.o getnp4old.o NullEvol.o initial_null.o initial_maxwell.o\
getnpem2.o empart.o NullNews.o fourdcurvature.o\ getnpem2.o empart.o NullNews.o fourdcurvature.o\
@@ -178,9 +181,12 @@ $(CUDAFILES): bssn_gpu.h gpu_mem.h gpu_rhsSS_mem.h
misc.o : zbesh.o misc.o : zbesh.o
# projects # projects
ABE: $(C++FILES) $(CFILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) ABE: $(C++FILES) $(CFILES_CUDA) $(F90FILES) $(F77FILES) $(AHFDOBJS)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES) $(CFILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(LDLIBS) $(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES) $(CFILES_CUDA) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(LDLIBS) -lcudart $(CUDA_LIB_PATH)
ABE_CUDA: $(C++FILES) $(CFILES_CUDA) $(F90FILES) $(F77FILES) $(AHFDOBJS)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES) $(CFILES_CUDA) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(LDLIBS) -lcudart $(CUDA_LIB_PATH)
ABEGPU: $(C++FILES_GPU) $(CFILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES) ABEGPU: $(C++FILES_GPU) $(CFILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES)
$(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES_GPU) $(CFILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES) $(LDLIBS) $(CLINKER) $(CXXAPPFLAGS) -o $@ $(C++FILES_GPU) $(CFILES) $(F90FILES) $(F77FILES) $(AHFDOBJS) $(CUDAFILES) $(LDLIBS)
@@ -188,4 +194,4 @@ TwoPunctureABE: $(TwoPunctureFILES)
$(CLINKER) $(TP_OPTFLAGS) -qopenmp -o $@ $(TwoPunctureFILES) $(LDLIBS) $(CLINKER) $(TP_OPTFLAGS) -qopenmp -o $@ $(TwoPunctureFILES) $(LDLIBS)
clean: clean:
rm *.o ABE ABEGPU TwoPunctureABE make.log -f rm *.o ABE ABE_CUDA ABEGPU TwoPunctureABE make.log -f

View File

@@ -62,4 +62,4 @@ CLINKER = mpiicpx
Cu = nvcc Cu = nvcc
CUDA_LIB_PATH = -L/usr/lib/cuda/lib64 -I/usr/include -I/usr/lib/cuda/include CUDA_LIB_PATH = -L/usr/lib/cuda/lib64 -I/usr/include -I/usr/lib/cuda/include
#CUDA_APP_FLAGS = -c -g -O3 --ptxas-options=-v -arch compute_13 -code compute_13,sm_13 -Dfortran3 -Dnewc #CUDA_APP_FLAGS = -c -g -O3 --ptxas-options=-v -arch compute_13 -code compute_13,sm_13 -Dfortran3 -Dnewc
CUDA_APP_FLAGS = -c -g -O3 --ptxas-options=-v -Dfortran3 -Dnewc CUDA_APP_FLAGS = -c -g -O3 --ptxas-options=-v -Dfortran3 -Dnewc -arch=sm_80

View File

@@ -217,7 +217,6 @@
real*8,dimension(2*ghost_width) :: X,Y,Z real*8,dimension(2*ghost_width) :: X,Y,Z
real*8, dimension(2*ghost_width,2*ghost_width) :: tmp2 real*8, dimension(2*ghost_width,2*ghost_width) :: tmp2
real*8, dimension(2*ghost_width) :: tmp1 real*8, dimension(2*ghost_width) :: tmp1
real*8 :: ddy
real*8,dimension(3) :: ccp real*8,dimension(3) :: ccp
#if (ghost_width == 2) #if (ghost_width == 2)
@@ -580,7 +579,7 @@
tmp1(ghost_width-cxI(1)+cxB(1) :ghost_width-cxI(1)+cxT(1) ) = funf(cxB(1):cxT(1),j,k) tmp1(ghost_width-cxI(1)+cxB(1) :ghost_width-cxI(1)+cxT(1) ) = funf(cxB(1):cxT(1),j,k)
endif endif
call polint(X,tmp1,0.d0,funf(i,j,k),ddy,2*ghost_width) call polint0(X,tmp1,funf(i,j,k),2*ghost_width)
! for y direction ! for y direction
elseif(sum(fg).eq.2.and.fg(2) .eq. 0.and. & elseif(sum(fg).eq.2.and.fg(2) .eq. 0.and. &
@@ -690,7 +689,7 @@
tmp1(ghost_width-cxI(2)+cxB(2) :ghost_width-cxI(2)+cxT(2) ) = funf(i,cxB(2):cxT(2),k) tmp1(ghost_width-cxI(2)+cxB(2) :ghost_width-cxI(2)+cxT(2) ) = funf(i,cxB(2):cxT(2),k)
endif endif
call polint(Y,tmp1,0.d0,funf(i,j,k),ddy,2*ghost_width) call polint0(Y,tmp1,funf(i,j,k),2*ghost_width)
! for z direction ! for z direction
elseif(sum(fg).eq.2.and.fg(3) .eq. 0.and. & elseif(sum(fg).eq.2.and.fg(3) .eq. 0.and. &
@@ -802,7 +801,7 @@
tmp1(ghost_width-cxI(3)+cxB(3) :ghost_width-cxI(3)+cxT(3) ) = funf(i,j,cxB(3):cxT(3)) tmp1(ghost_width-cxI(3)+cxB(3) :ghost_width-cxI(3)+cxT(3) ) = funf(i,j,cxB(3):cxT(3))
endif endif
call polint(Z,tmp1,0.d0,funf(i,j,k),ddy,2*ghost_width) call polint0(Z,tmp1,funf(i,j,k),2*ghost_width)
#else #else

View File

@@ -217,7 +217,6 @@
real*8,dimension(2*ghost_width) :: X,Y,Z real*8,dimension(2*ghost_width) :: X,Y,Z
real*8, dimension(2*ghost_width,2*ghost_width) :: tmp2 real*8, dimension(2*ghost_width,2*ghost_width) :: tmp2
real*8, dimension(2*ghost_width) :: tmp1 real*8, dimension(2*ghost_width) :: tmp1
real*8 :: ddy
#if (ghost_width == 2) #if (ghost_width == 2)
real*8, parameter :: C1=-1.d0/16,C2=9.d0/16 real*8, parameter :: C1=-1.d0/16,C2=9.d0/16
@@ -470,7 +469,7 @@
tmp1(cxB(1)+ghost_width-i+1:cxT(1)+ghost_width-i+1) = fh(cxB(1):cxT(1),j,k) tmp1(cxB(1)+ghost_width-i+1:cxT(1)+ghost_width-i+1) = fh(cxB(1):cxT(1),j,k)
call polint(X,tmp1,0.d0,funf(i,j,k),ddy,2*ghost_width) call polint0(X,tmp1,funf(i,j,k),2*ghost_width)
! for y direction ! for y direction
elseif (fg(2) .eq. 0)then elseif (fg(2) .eq. 0)then
@@ -529,7 +528,7 @@
tmp1(cxB(2)+ghost_width-j+1:cxT(2)+ghost_width-j+1) = fh(i,cxB(2):cxT(2),k) tmp1(cxB(2)+ghost_width-j+1:cxT(2)+ghost_width-j+1) = fh(i,cxB(2):cxT(2),k)
call polint(Y,tmp1,0.d0,funf(i,j,k),ddy,2*ghost_width) call polint0(Y,tmp1,funf(i,j,k),2*ghost_width)
! for z direction ! for z direction
else else
@@ -588,7 +587,7 @@
tmp1(cxB(3)+ghost_width-k+1:cxT(3)+ghost_width-k+1) = fh(i,j,cxB(3):cxT(3)) tmp1(cxB(3)+ghost_width-k+1:cxT(3)+ghost_width-k+1) = fh(i,j,cxB(3):cxT(3))
call polint(Z,tmp1,0.d0,funf(i,j,k),ddy,2*ghost_width) call polint0(Z,tmp1,funf(i,j,k),2*ghost_width)
endif endif

View File

@@ -2,7 +2,6 @@
#include <cstdio> #include <cstdio>
#include <cstdlib> #include <cstdlib>
#include <cstddef> #include <cstddef>
#include <complex>
#include <immintrin.h> #include <immintrin.h>
namespace { namespace {
@@ -118,62 +117,6 @@ inline void rk4_stage3(std::size_t n,
extern "C" { extern "C" {
void f_rungekutta4_scalar(double &dT, double &f0, double &f1, double &f_rhs, int &RK4) {
constexpr double F1o6 = 1.0 / 6.0;
constexpr double HLF = 0.5;
constexpr double TWO = 2.0;
switch (RK4) {
case 0:
f1 = f0 + HLF * dT * f_rhs;
break;
case 1:
f_rhs = f_rhs + TWO * f1;
f1 = f0 + HLF * dT * f1;
break;
case 2:
f_rhs = f_rhs + TWO * f1;
f1 = f0 + dT * f1;
break;
case 3:
f1 = f0 + F1o6 * dT * (f1 + f_rhs);
break;
default:
std::fprintf(stderr, "rungekutta4_scalar_c: invalid RK4 stage %d\n", RK4);
std::abort();
}
}
void rungekutta4_cplxscalar_(double &dT,
std::complex<double> &f0,
std::complex<double> &f1,
std::complex<double> &f_rhs,
int &RK4) {
constexpr double F1o6 = 1.0 / 6.0;
constexpr double HLF = 0.5;
constexpr double TWO = 2.0;
switch (RK4) {
case 0:
f1 = f0 + HLF * dT * f_rhs;
break;
case 1:
f_rhs = f_rhs + TWO * f1;
f1 = f0 + HLF * dT * f1;
break;
case 2:
f_rhs = f_rhs + TWO * f1;
f1 = f0 + dT * f1;
break;
case 3:
f1 = f0 + F1o6 * dT * (f1 + f_rhs);
break;
default:
std::fprintf(stderr, "rungekutta4_cplxscalar_c: invalid RK4 stage %d\n", RK4);
std::abort();
}
}
int f_rungekutta4_rout(int *ex, double &dT, int f_rungekutta4_rout(int *ex, double &dT,
double *f0, double *f1, double *f_rhs, double *f0, double *f1, double *f_rhs,
int &RK4) { int &RK4) {

View File

@@ -24,10 +24,4 @@ void lopsided(const int ex[3],
const double *X, const double *Y, const double *Z, const double *X, const double *Y, const double *Z,
const double *f, double *f_rhs, const double *f, double *f_rhs,
const double *Sfx, const double *Sfy, const double *Sfz, const double *Sfx, const double *Sfy, const double *Sfz,
int Symmetry, const double SoA[3]); int Symmetry, const double SoA[3]);
void lopsided_kodis(const int ex[3],
const double *X, const double *Y, const double *Z,
const double *f, double *f_rhs,
const double *Sfx, const double *Sfy, const double *Sfz,
int Symmetry, const double SoA[3], double eps);

View File

@@ -70,7 +70,7 @@ def makefile_ABE():
## Build command with CPU binding to nohz_full cores ## Build command with CPU binding to nohz_full cores
if (input_data.GPU_Calculation == "no"): if (input_data.GPU_Calculation == "no"):
makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} INTERP_LB_MODE=off ABE" makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} INTERP_LB_MODE=optimize ABE"
elif (input_data.GPU_Calculation == "yes"): elif (input_data.GPU_Calculation == "yes"):
makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} ABEGPU" makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} ABEGPU"
else: else:

View File

@@ -0,0 +1,97 @@
# AMSS-NCKU PGO Profile Analysis Report
## 1. Profiling Environment
| Item | Value |
|------|-------|
| Compiler | Intel oneAPI DPC++/C++ 2025.3.0 (icpx/ifx) |
| Instrumentation Flag | `-fprofile-instr-generate` |
| Optimization Level (instrumented) | `-O2 -xHost -fma` |
| MPI Processes | 1 (single process to avoid MPI+instrumentation deadlock) |
| Profile File | `default_9725750769337483397_0.profraw` (327 KB) |
| Merged Profile | `default.profdata` (394 KB) |
| llvm-profdata | `/home/intel/oneapi/compiler/2025.3/bin/compiler/llvm-profdata` |
## 2. Reduced Simulation Parameters (for profiling run)
| Parameter | Production Value | Profiling Value |
|-----------|-----------------|-----------------|
| MPI_processes | 64 | 1 |
| grid_level | 9 | 4 |
| static_grid_level | 5 | 3 |
| static_grid_number | 96 | 24 |
| moving_grid_number | 48 | 16 |
| largest_box_xyz_max | 320^3 | 160^3 |
| Final_Evolution_Time | 1000.0 | 10.0 |
| Evolution_Step_Number | 10,000,000 | 1,000 |
| Detector_Number | 12 | 2 |
## 3. Profile Summary
| Metric | Value |
|--------|-------|
| Total instrumented functions | 1,392 |
| Functions with non-zero counts | 117 (8.4%) |
| Functions with zero counts | 1,275 (91.6%) |
| Maximum function entry count | 386,459,248 |
| Maximum internal block count | 370,477,680 |
| Total block count | 4,198,023,118 |
## 4. Top 20 Hotspot Functions
| Rank | Total Count | Max Block Count | Function | Category |
|------|------------|-----------------|----------|----------|
| 1 | 1,241,601,732 | 370,477,680 | `polint_` | Interpolation |
| 2 | 755,994,435 | 230,156,640 | `prolong3_` | Grid prolongation |
| 3 | 667,964,095 | 3,697,792 | `compute_rhs_bssn_` | BSSN RHS evolution |
| 4 | 539,736,051 | 386,459,248 | `symmetry_bd_` | Symmetry boundary |
| 5 | 277,310,808 | 53,170,728 | `lopsided_` | Lopsided FD stencil |
| 6 | 155,534,488 | 94,535,040 | `decide3d_` | 3D grid decision |
| 7 | 119,267,712 | 19,266,048 | `rungekutta4_rout_` | RK4 time integrator |
| 8 | 91,574,616 | 48,824,160 | `kodis_` | Kreiss-Oliger dissipation |
| 9 | 67,555,389 | 43,243,680 | `fderivs_` | Finite differences |
| 10 | 55,296,000 | 42,246,144 | `misc::fact(int)` | Factorial utility |
| 11 | 43,191,071 | 27,663,328 | `fdderivs_` | 2nd-order FD derivatives |
| 12 | 36,233,965 | 22,429,440 | `restrict3_` | Grid restriction |
| 13 | 24,698,512 | 17,231,520 | `polin3_` | Polynomial interpolation |
| 14 | 22,962,942 | 20,968,768 | `copy_` | Data copy |
| 15 | 20,135,696 | 17,259,168 | `Ansorg::barycentric(...)` | Spectral interpolation |
| 16 | 14,650,224 | 7,224,768 | `Ansorg::barycentric_omega(...)` | Spectral weights |
| 17 | 13,242,296 | 2,871,920 | `global_interp_` | Global interpolation |
| 18 | 12,672,000 | 7,734,528 | `sommerfeld_rout_` | Sommerfeld boundary |
| 19 | 6,872,832 | 1,880,064 | `sommerfeld_routbam_` | Sommerfeld boundary (BAM) |
| 20 | 5,709,900 | 2,809,632 | `l2normhelper_` | L2 norm computation |
## 5. Hotspot Category Breakdown
Top 20 functions account for ~98% of total execution counts:
| Category | Functions | Combined Count | Share |
|----------|-----------|---------------|-------|
| Interpolation / Prolongation / Restriction | polint_, prolong3_, restrict3_, polin3_, global_interp_, Ansorg::* | ~2,093M | ~50% |
| BSSN RHS + FD stencils | compute_rhs_bssn_, lopsided_, fderivs_, fdderivs_ | ~1,056M | ~25% |
| Boundary conditions | symmetry_bd_, sommerfeld_rout_, sommerfeld_routbam_ | ~559M | ~13% |
| Time integration | rungekutta4_rout_ | ~119M | ~3% |
| Dissipation | kodis_ | ~92M | ~2% |
| Utilities | misc::fact, decide3d_, copy_, l2normhelper_ | ~256M | ~6% |
## 6. Conclusions
1. **Profile data is valid**: 1,392 functions instrumented, 117 exercised with ~4.2 billion total counts.
2. **Hotspot concentration is high**: Top 5 functions alone account for ~76% of all counts, which is ideal for PGO — the compiler has strong branch/layout optimization targets.
3. **Fortran numerical kernels dominate**: `polint_`, `prolong3_`, `compute_rhs_bssn_`, `symmetry_bd_`, `lopsided_` are all Fortran routines in the inner evolution loop. PGO will optimize their branch prediction and basic block layout.
4. **91.6% of functions have zero counts**: These are code paths for unused features (GPU, BSSN-EScalar, BSSN-EM, Z4C, etc.). PGO will deprioritize them, improving instruction cache utilization.
5. **Profile is representative**: Despite the reduced grid size, the code path coverage matches production — the same kernels (RHS, prolongation, restriction, boundary) are exercised. PGO branch probabilities from this profile will transfer well to full-scale runs.
## 7. PGO Phase 2 Usage
To apply the profile, use the following flags in `makefile.inc`:
```makefile
CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=/home/amss/AMSS-NCKU/pgo_profile/default.profdata \
-Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
-fprofile-instr-use=/home/amss/AMSS-NCKU/pgo_profile/default.profdata \
-align array64byte -fpp -I${MKLROOT}/include
```

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.