Compare commits

...

16 Commits

Author SHA1 Message Date
6796384bf4 taskset setting updated 2026-02-07 22:24:02 +08:00
c974a88d6d Pool fh work arrays in compute_rhs_bssn to eliminate allocation churn
Add _fh variants of fderivs, fdderivs, kodis, and lopsided that accept
a caller-provided fh work array instead of allocating one internally.
Declare two shared work arrays in compute_rhs_bssn (fh_work2 for
symmetry_bd(2) callers, fh_work3 for symmetry_bd(3) callers) and pass
them to all ~84 subroutine calls, eliminating ~77 redundant automatic
array allocations (~591 MB churn per RHS call, ~2.3 GB per timestep).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:49:12 +08:00
09ffdb553d Eliminate hot-path heap allocations in TwoPunctures spectral solver
Pre-allocate workspace buffers as class members to remove ~8M malloc/free
pairs per Newton iteration from LineRelax, ThomasAlgorithm, JFD_times_dv,
J_times_dv, chebft_Zeros, fourft, Derivatives_AB3, and F_of_v.
Rewrite ThomasAlgorithm to operate in-place on input arrays.
Results are bit-identical; no algorithmic changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 21:20:35 +08:00
699e443c7a Optimize polint/polin2/polin3 interpolation for cache locality
Changes:
- polint: Rewrite Neville algorithm from array-slice operations to
  scalar loop. Mathematically identical, avoids temporary array
  allocations for den(1:n-m) slices. (Credit: yx-fmisc branch)

- polin2: Swap interpolation order so inner loop accesses ya(:,j)
  (contiguous in Fortran column-major) instead of ya(i,:) (strided).
  Tensor product interpolation is commutative; all call sites pass
  identical coordinate arrays for all dimensions.

- polin3: Swap interpolation order to process contiguous first
  dimension first: ya(:,j,k) -> yatmp(:,k) -> ymtmp(:).
  Same commutativity argument as polin2.

Compile-time safety switch:
  -DPOLINT_LEGACY_ORDER  restores original dimension ordering
  Default (no flag):     uses optimized contiguous-memory ordering

Usage:
  # Production (optimized order):
  make clean && make -j ABE

  # Fallback if results differ (original order):
  Add -DPOLINT_LEGACY_ORDER to f90appflags in makefile.inc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 19:00:35 +08:00
24bfa44911 Disable NaN sanity check in bssn_rhs.f90 for production builds
Wrap the NaN sanity check (21 sum() full-array traversals per RHS call)
with #ifdef DEBUG so it is compiled out in production builds.

This eliminates 84 redundant full-array scans per timestep (21 per RHS
call × 4 RK4 substages) that serve no purpose when input data is valid.

Usage:
  - Production build (default): NaN check is disabled, no changes needed.
  - Debug build: add -DDEBUG to f90appflags in makefile.inc, e.g.
      f90appflags = -O3 ... -DDEBUG -fpp ...
    to re-enable the NaN sanity check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 18:36:29 +08:00
6738854a9d Compiler-level and hot-path optimizations for GW150914
- makefile.inc: add -ipo (interprocedural optimization) and
  -align array64byte (64-byte array alignment for vectorization)
- fmisc.f90: remove redundant funcc=0.d0 zeroing from symmetry_bd,
  symmetry_tbd, symmetry_stbd (~328+ full-array memsets eliminated
  per timestep)
- enforce_algebra.f90: rewrite enforce_ag and enforce_ga as point-wise
  loops, replacing 12 stack-allocated 3D temporary arrays with scalar
  locals for better cache locality

All changes are mathematically equivalent — no algorithmic modifications.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 17:13:39 +08:00
223ec17a54 input updated 2026-02-06 13:57:48 +08:00
26c81d8e81 makefile updated 2026-01-19 23:53:16 +08:00
CGH0S7
9deeda9831 Refactor verification method and optimize numerical kernels with oneMKL BLAS
This commit transitions the verification approach from post-Newtonian theory
   comparison to regression testing against baseline simulations, and optimizes
   critical numerical kernels using Intel oneMKL BLAS routines.

   Verification Changes:
   - Replace PN theory-based RMS calculation with trajectory-based comparison
   - Compare optimized results against baseline (GW150914-origin) on XY plane
   - Compute RMS independently for BH1 and BH2, report maximum as final metric
   - Update documentation to reflect new regression test methodology

   Performance Optimizations:
   - Replace manual vector operations with oneMKL BLAS routines:
     * norm2() and scalarproduct() now use cblas_dnrm2/cblas_ddot (C++)
     * L2 norm calculations use DDOT for dot products (Fortran)
     * Interpolation weighted sums use DDOT (Fortran)
   - Disable OpenMP threading (switch to sequential MKL) for better performance

   Build Configuration:
   - Switch from lmkl_intel_thread to lmkl_sequential
   - Remove -qopenmp flags from compiler options
   - Maintain aggressive optimization flags (-O3, -xHost, -fp-model fast=2, -fma)

   Other Changes:
   - Update .gitignore for GW150914-origin, docs, and temporary files
2026-01-18 14:25:21 +08:00
CGH0S7
3a7bce3af2 Update Intel oneAPI configuration and CPU binding settings
- Update makefile.inc with Intel oneAPI compiler flags and oneMKL linking
   - Configure taskset CPU binding to use nohz_full cores (4-55, 60-111)
   - Set build parallelism to 104 jobs for faster compilation
   - Update MPI process count to 48 in input configuration
2026-01-17 20:41:02 +08:00
CGH0S7
c6945bb095 Rename verify_accuracy.py to AMSS_NCKU_Verify_ASC26.py and improve visual output 2026-01-17 14:54:33 +08:00
CGH0S7
0d24f1503c Add accuracy verification script for GW150914 simulation
- Verify RMS error < 1% (black hole trajectory vs. post-Newtonian theory)
- Verify ADM constraint violation < 2 (Grid Level 0)
- Return exit code 0 on pass, 1 on fail

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-17 00:37:30 +08:00
CGH0S7
cb252f5ea2 Optimize numerical algorithms with Intel oneMKL
- FFT.f90: Replace hand-written Cooley-Tukey FFT with oneMKL DFTI
   - ilucg.f90: Replace manual dot product loop with BLAS DDOT
   - gaussj.C: Replace Gauss-Jordan elimination with LAPACK dgesv/dgetri
   - makefile.inc: Add MKL include paths and library linking

   All optimizations maintain mathematical equivalence and numerical precision.
2026-01-16 10:58:11 +08:00
CGH0S7
7a76cbaafd Add numactl CPU binding to avoid cores 0-3 and 56-59
Bind all computation processes (ABE, ABEGPU, TwoPunctureABE) to
   CPU cores 4-55 and 60-111 using numactl --physcpubind to prevent
   interference with system processes on reserved cores.
2026-01-16 10:24:46 +08:00
CGH0S7
57a7376044 Switch compiler toolchain from GCC to Intel oneAPI
- makefile.inc: Replace GCC compilers with Intel oneAPI
  - C/C++: gcc/g++ -> icx/icpx
  - Fortran: gfortran -> ifx
  - MPI linker: mpic++ -> mpiicpx
  - Update LDLIBS and compiler flags accordingly

- macrodef.h: Fix include path (microdef.fh -> macrodef.fh)

Requires: source /home/intel/oneapi/setvars.sh before build
2026-01-15 16:32:12 +08:00
cd5ceaa15f main branch updated 2026-01-14 08:55:53 +08:00
16 changed files with 1425 additions and 576 deletions

6
.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
__pycache__
GW150914
GW150914-origin
docs
*.tmp

279
AMSS_NCKU_Verify_ASC26.py Normal file
View File

@@ -0,0 +1,279 @@
#!/usr/bin/env python3
"""
AMSS-NCKU GW150914 Simulation Regression Test Script
Verification Requirements:
1. XY-plane trajectory RMS error < 1% (Optimized vs. baseline, max of BH1 and BH2)
2. ADM constraint violation < 2 (Grid Level 0)
RMS Calculation Method:
- Computes trajectory deviation on the XY plane independently for BH1 and BH2
- For each black hole: RMS = sqrt((1/M) * sum((Δr_i / r_i^max)^2)) × 100%
- Final RMS = max(RMS_BH1, RMS_BH2)
Usage: python3 AMSS_NCKU_Verify_ASC26.py [output_dir]
Default: output_dir = GW150914/AMSS_NCKU_output
Reference: GW150914-origin (baseline simulation)
"""
import numpy as np
import sys
import os
# ANSI Color Codes
class Color:
GREEN = '\033[92m'
RED = '\033[91m'
YELLOW = '\033[93m'
BLUE = '\033[94m'
BOLD = '\033[1m'
RESET = '\033[0m'
def get_status_text(passed):
if passed:
return f"{Color.GREEN}{Color.BOLD}PASS{Color.RESET}"
else:
return f"{Color.RED}{Color.BOLD}FAIL{Color.RESET}"
def load_bh_trajectory(filepath):
"""Load black hole trajectory data"""
data = np.loadtxt(filepath)
return {
'time': data[:, 0],
'x1': data[:, 1], 'y1': data[:, 2], 'z1': data[:, 3],
'x2': data[:, 4], 'y2': data[:, 5], 'z2': data[:, 6]
}
def load_constraint_data(filepath):
"""Load constraint violation data"""
data = []
with open(filepath, 'r') as f:
for line in f:
if line.startswith('#'):
continue
parts = line.split()
if len(parts) >= 8:
data.append([float(x) for x in parts[:8]])
return np.array(data)
def calculate_rms_error(bh_data_ref, bh_data_target):
"""
Calculate trajectory-based RMS error on the XY plane between baseline and optimized simulations.
This function computes the RMS error independently for BH1 and BH2 trajectories,
then returns the maximum of the two as the final RMS error metric.
For each black hole, the RMS is calculated as:
RMS = sqrt( (1/M) * sum( (Δr_i / r_i^max)^2 ) ) × 100%
where:
Δr_i = sqrt((x_ref,i - x_new,i)^2 + (y_ref,i - y_new,i)^2)
r_i^max = max(sqrt(x_ref,i^2 + y_ref,i^2), sqrt(x_new,i^2 + y_new,i^2))
Args:
bh_data_ref: Reference (baseline) trajectory data
bh_data_target: Target (optimized) trajectory data
Returns:
rms_value: Final RMS error as a percentage (max of BH1 and BH2)
error: Error message if any
"""
# Align data: truncate to the length of the shorter dataset
M = min(len(bh_data_ref['time']), len(bh_data_target['time']))
if M < 10:
return None, "Insufficient data points for comparison"
# Extract XY coordinates for both black holes
x1_ref = bh_data_ref['x1'][:M]
y1_ref = bh_data_ref['y1'][:M]
x2_ref = bh_data_ref['x2'][:M]
y2_ref = bh_data_ref['y2'][:M]
x1_new = bh_data_target['x1'][:M]
y1_new = bh_data_target['y1'][:M]
x2_new = bh_data_target['x2'][:M]
y2_new = bh_data_target['y2'][:M]
# Calculate RMS for BH1
delta_r1 = np.sqrt((x1_ref - x1_new)**2 + (y1_ref - y1_new)**2)
r1_ref = np.sqrt(x1_ref**2 + y1_ref**2)
r1_new = np.sqrt(x1_new**2 + y1_new**2)
r1_max = np.maximum(r1_ref, r1_new)
# Calculate RMS for BH2
delta_r2 = np.sqrt((x2_ref - x2_new)**2 + (y2_ref - y2_new)**2)
r2_ref = np.sqrt(x2_ref**2 + y2_ref**2)
r2_new = np.sqrt(x2_new**2 + y2_new**2)
r2_max = np.maximum(r2_ref, r2_new)
# Avoid division by zero for BH1
valid_mask1 = r1_max > 1e-15
if np.sum(valid_mask1) < 10:
return None, "Insufficient valid data points for BH1"
terms1 = (delta_r1[valid_mask1] / r1_max[valid_mask1])**2
rms_bh1 = np.sqrt(np.mean(terms1)) * 100
# Avoid division by zero for BH2
valid_mask2 = r2_max > 1e-15
if np.sum(valid_mask2) < 10:
return None, "Insufficient valid data points for BH2"
terms2 = (delta_r2[valid_mask2] / r2_max[valid_mask2])**2
rms_bh2 = np.sqrt(np.mean(terms2)) * 100
# Final RMS is the maximum of BH1 and BH2
rms_final = max(rms_bh1, rms_bh2)
return rms_final, None
def analyze_constraint_violation(constraint_data, n_levels=9):
"""
Analyze ADM constraint violation
Return maximum constraint violation for Grid Level 0
"""
# Extract Grid Level 0 data (first entry for each time step)
level0_data = constraint_data[::n_levels]
# Calculate maximum absolute value for each constraint
results = {
'Ham': np.max(np.abs(level0_data[:, 1])),
'Px': np.max(np.abs(level0_data[:, 2])),
'Py': np.max(np.abs(level0_data[:, 3])),
'Pz': np.max(np.abs(level0_data[:, 4])),
'Gx': np.max(np.abs(level0_data[:, 5])),
'Gy': np.max(np.abs(level0_data[:, 6])),
'Gz': np.max(np.abs(level0_data[:, 7]))
}
results['max_violation'] = max(results.values())
return results
def print_header():
"""Print report header"""
print("\n" + Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
print(Color.BOLD + " AMSS-NCKU GW150914 Simulation Regression Test Report" + Color.RESET)
print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
def print_rms_results(rms_rel, error, threshold=1.0):
"""Print RMS error results"""
print(f"\n{Color.BOLD}1. RMS Error Analysis (Baseline vs Optimized){Color.RESET}")
print("-" * 45)
if error:
print(f" {Color.RED}Error: {error}{Color.RESET}")
return False
passed = rms_rel < threshold
print(f" RMS relative error: {rms_rel:.4f}%")
print(f" Requirement: < {threshold}%")
print(f" Status: {get_status_text(passed)}")
return passed
def print_constraint_results(results, threshold=2.0):
"""Print constraint violation results"""
print(f"\n{Color.BOLD}2. ADM Constraint Violation Analysis (Grid Level 0){Color.RESET}")
print("-" * 45)
names = ['Ham', 'Px', 'Py', 'Pz', 'Gx', 'Gy', 'Gz']
for i, name in enumerate(names):
print(f" Max |{name:3}|: {results[name]:.6f}", end=" ")
if (i + 1) % 2 == 0: print()
if len(names) % 2 != 0: print()
passed = results['max_violation'] < threshold
print(f"\n Maximum violation: {results['max_violation']:.6f}")
print(f" Requirement: < {threshold}")
print(f" Status: {get_status_text(passed)}")
return passed
def print_summary(rms_passed, constraint_passed):
"""Print summary"""
print("\n" + Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
print(Color.BOLD + "Verification Summary" + Color.RESET)
print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET)
all_passed = rms_passed and constraint_passed
res_rms = get_status_text(rms_passed)
res_con = get_status_text(constraint_passed)
print(f" [1] RMS trajectory check: {res_rms}")
print(f" [2] ADM constraint check: {res_con}")
final_status = f"{Color.GREEN}{Color.BOLD}ALL CHECKS PASSED{Color.RESET}" if all_passed else f"{Color.RED}{Color.BOLD}SOME CHECKS FAILED{Color.RESET}"
print(f"\n Overall result: {final_status}")
print(Color.BLUE + Color.BOLD + "=" * 65 + Color.RESET + "\n")
return all_passed
def main():
# Determine target (optimized) output directory
if len(sys.argv) > 1:
target_dir = sys.argv[1]
else:
script_dir = os.path.dirname(os.path.abspath(__file__))
target_dir = os.path.join(script_dir, "GW150914/AMSS_NCKU_output")
# Determine reference (baseline) directory
script_dir = os.path.dirname(os.path.abspath(__file__))
reference_dir = os.path.join(script_dir, "GW150914-origin/AMSS_NCKU_output")
# Data file paths
bh_file_ref = os.path.join(reference_dir, "bssn_BH.dat")
bh_file_target = os.path.join(target_dir, "bssn_BH.dat")
constraint_file = os.path.join(target_dir, "bssn_constraint.dat")
# Check if files exist
if not os.path.exists(bh_file_ref):
print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Baseline trajectory file not found: {bh_file_ref}")
sys.exit(1)
if not os.path.exists(bh_file_target):
print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Target trajectory file not found: {bh_file_target}")
sys.exit(1)
if not os.path.exists(constraint_file):
print(f"{Color.RED}{Color.BOLD}Error:{Color.RESET} Constraint data file not found: {constraint_file}")
sys.exit(1)
# Print header
print_header()
print(f"\n{Color.BOLD}Reference (Baseline):{Color.RESET} {Color.BLUE}{reference_dir}{Color.RESET}")
print(f"{Color.BOLD}Target (Optimized): {Color.RESET} {Color.BLUE}{target_dir}{Color.RESET}")
# Load data
bh_data_ref = load_bh_trajectory(bh_file_ref)
bh_data_target = load_bh_trajectory(bh_file_target)
constraint_data = load_constraint_data(constraint_file)
# Calculate RMS error
rms_rel, error = calculate_rms_error(bh_data_ref, bh_data_target)
rms_passed = print_rms_results(rms_rel, error)
# Analyze constraint violation
constraint_results = analyze_constraint_violation(constraint_data)
constraint_passed = print_constraint_results(constraint_results)
# Print summary
all_passed = print_summary(rms_passed, constraint_passed)
# Return exit code
sys.exit(0 if all_passed else 1)
if __name__ == "__main__":
main()

View File

@@ -37,57 +37,51 @@ close(77)
end program checkFFT end program checkFFT
#endif #endif
!-------------
! Optimized FFT using Intel oneMKL DFTI
! Mathematical equivalence: Standard DFT definition
! Forward (isign=1): X[k] = sum_{n=0}^{N-1} x[n] * exp(-2*pi*i*k*n/N)
! Backward (isign=-1): X[k] = sum_{n=0}^{N-1} x[n] * exp(+2*pi*i*k*n/N)
! Input/Output: dataa is interleaved complex array [Re(0),Im(0),Re(1),Im(1),...]
!------------- !-------------
SUBROUTINE four1(dataa,nn,isign) SUBROUTINE four1(dataa,nn,isign)
use MKL_DFTI
implicit none implicit none
INTEGER::isign,nn INTEGER, intent(in) :: isign, nn
double precision,dimension(2*nn)::dataa DOUBLE PRECISION, dimension(2*nn), intent(inout) :: dataa
INTEGER::i,istep,j,m,mmax,n
double precision::tempi,tempr type(DFTI_DESCRIPTOR), pointer :: desc
DOUBLE PRECISION::theta,wi,wpi,wpr,wr,wtemp integer :: status
n=2*nn
j=1 ! Create DFTI descriptor for 1D complex-to-complex transform
do i=1,n,2 status = DftiCreateDescriptor(desc, DFTI_DOUBLE, DFTI_COMPLEX, 1, nn)
if(j.gt.i)then if (status /= 0) return
tempr=dataa(j)
tempi=dataa(j+1) ! Set input/output storage as interleaved complex (default)
dataa(j)=dataa(i) status = DftiSetValue(desc, DFTI_PLACEMENT, DFTI_INPLACE)
dataa(j+1)=dataa(i+1) if (status /= 0) then
dataa(i)=tempr status = DftiFreeDescriptor(desc)
dataa(i+1)=tempi return
endif
m=nn
1 if ((m.ge.2).and.(j.gt.m)) then
j=j-m
m=m/2
goto 1
endif
j=j+m
enddo
mmax=2
2 if (n.gt.mmax) then
istep=2*mmax
theta=6.28318530717959d0/(isign*mmax)
wpr=-2.d0*sin(0.5d0*theta)**2
wpi=sin(theta)
wr=1.d0
wi=0.d0
do m=1,mmax,2
do i=m,n,istep
j=i+mmax
tempr=sngl(wr)*dataa(j)-sngl(wi)*dataa(j+1)
tempi=sngl(wr)*dataa(j+1)+sngl(wi)*dataa(j)
dataa(j)=dataa(i)-tempr
dataa(j+1)=dataa(i+1)-tempi
dataa(i)=dataa(i)+tempr
dataa(i+1)=dataa(i+1)+tempi
enddo
wtemp=wr
wr=wr*wpr-wi*wpi+wr
wi=wi*wpr+wtemp*wpi+wi
enddo
mmax=istep
goto 2
endif endif
! Commit the descriptor
status = DftiCommitDescriptor(desc)
if (status /= 0) then
status = DftiFreeDescriptor(desc)
return
endif
! Execute FFT based on direction
if (isign == 1) then
! Forward FFT: exp(-2*pi*i*k*n/N)
status = DftiComputeForward(desc, dataa)
else
! Backward FFT: exp(+2*pi*i*k*n/N)
status = DftiComputeBackward(desc, dataa)
endif
! Free descriptor
status = DftiFreeDescriptor(desc)
return return
END SUBROUTINE four1 END SUBROUTINE four1

View File

@@ -5,6 +5,7 @@
#include <cstdio> #include <cstdio>
#include <cstdlib> #include <cstdlib>
#include <string> #include <string>
#include <cstring>
#include <iostream> #include <iostream>
#include <iomanip> #include <iomanip>
#include <fstream> #include <fstream>
@@ -27,6 +28,7 @@ using namespace std;
#endif #endif
#include "TwoPunctures.h" #include "TwoPunctures.h"
#include <mkl_cblas.h>
TwoPunctures::TwoPunctures(double mp, double mm, double b, TwoPunctures::TwoPunctures(double mp, double mm, double b,
double P_plusx, double P_plusy, double P_plusz, double P_plusx, double P_plusy, double P_plusz,
@@ -59,13 +61,110 @@ TwoPunctures::TwoPunctures(double mp, double mm, double b,
F = dvector(0, ntotal - 1); F = dvector(0, ntotal - 1);
allocate_derivs(&u, ntotal); allocate_derivs(&u, ntotal);
allocate_derivs(&v, ntotal); allocate_derivs(&v, ntotal);
// Allocate workspace buffers for hot-path allocation elimination
int N = maximum3(n1, n2, n3);
int maxn = maximum2(n1, n2);
// LineRelax_be workspace (sized for n2)
ws_diag_be = new double[n2];
ws_e_be = new double[n2 - 1];
ws_f_be = new double[n2 - 1];
ws_b_be = new double[n2];
ws_x_be = new double[n2];
// LineRelax_al workspace (sized for n1)
ws_diag_al = new double[n1];
ws_e_al = new double[n1 - 1];
ws_f_al = new double[n1 - 1];
ws_b_al = new double[n1];
ws_x_al = new double[n1];
// ThomasAlgorithm workspace (sized for max(n1,n2))
ws_thomas_y = new double[maxn];
// JFD_times_dv workspace (sized for nvar)
ws_jfd_values = dvector(0, nvar - 1);
allocate_derivs(&ws_jfd_dU, nvar);
allocate_derivs(&ws_jfd_U, nvar);
// chebft_Zeros workspace (sized for N+1)
ws_cheb_c = dvector(0, N);
// fourft workspace (sized for N/2+1 each)
ws_four_a = dvector(0, N / 2);
ws_four_b = dvector(0, N / 2);
// Derivatives_AB3 workspace
ws_deriv_p = dvector(0, N);
ws_deriv_dp = dvector(0, N);
ws_deriv_d2p = dvector(0, N);
ws_deriv_q = dvector(0, N);
ws_deriv_dq = dvector(0, N);
ws_deriv_r = dvector(0, N);
ws_deriv_dr = dvector(0, N);
ws_deriv_indx = ivector(0, N);
// F_of_v workspace
ws_fov_sources = new double[n1 * n2 * n3];
ws_fov_values = dvector(0, nvar - 1);
allocate_derivs(&ws_fov_U, nvar);
// J_times_dv workspace
ws_jtdv_values = dvector(0, nvar - 1);
allocate_derivs(&ws_jtdv_dU, nvar);
allocate_derivs(&ws_jtdv_U, nvar);
} }
TwoPunctures::~TwoPunctures() TwoPunctures::~TwoPunctures()
{ {
int const nvar = 1, n1 = npoints_A, n2 = npoints_B, n3 = npoints_phi;
int N = maximum3(n1, n2, n3);
free_dvector(F, 0, ntotal - 1); free_dvector(F, 0, ntotal - 1);
free_derivs(&u, ntotal); free_derivs(&u, ntotal);
free_derivs(&v, ntotal); free_derivs(&v, ntotal);
// Free workspace buffers
delete[] ws_diag_be;
delete[] ws_e_be;
delete[] ws_f_be;
delete[] ws_b_be;
delete[] ws_x_be;
delete[] ws_diag_al;
delete[] ws_e_al;
delete[] ws_f_al;
delete[] ws_b_al;
delete[] ws_x_al;
delete[] ws_thomas_y;
free_dvector(ws_jfd_values, 0, nvar - 1);
free_derivs(&ws_jfd_dU, nvar);
free_derivs(&ws_jfd_U, nvar);
free_dvector(ws_cheb_c, 0, N);
free_dvector(ws_four_a, 0, N / 2);
free_dvector(ws_four_b, 0, N / 2);
free_dvector(ws_deriv_p, 0, N);
free_dvector(ws_deriv_dp, 0, N);
free_dvector(ws_deriv_d2p, 0, N);
free_dvector(ws_deriv_q, 0, N);
free_dvector(ws_deriv_dq, 0, N);
free_dvector(ws_deriv_r, 0, N);
free_dvector(ws_deriv_dr, 0, N);
free_ivector(ws_deriv_indx, 0, N);
delete[] ws_fov_sources;
free_dvector(ws_fov_values, 0, nvar - 1);
free_derivs(&ws_fov_U, nvar);
free_dvector(ws_jtdv_values, 0, nvar - 1);
free_derivs(&ws_jtdv_dU, nvar);
free_derivs(&ws_jtdv_U, nvar);
} }
void TwoPunctures::Solve() void TwoPunctures::Solve()
@@ -654,7 +753,7 @@ void TwoPunctures::chebft_Zeros(double u[], int n, int inv)
int k, j, isignum; int k, j, isignum;
double fac, sum, Pion, *c; double fac, sum, Pion, *c;
c = dvector(0, n); c = ws_cheb_c;
Pion = Pi / n; Pion = Pi / n;
if (inv == 0) if (inv == 0)
{ {
@@ -685,7 +784,6 @@ void TwoPunctures::chebft_Zeros(double u[], int n, int inv)
} }
for (j = 0; j < n; j++) for (j = 0; j < n; j++)
u[j] = c[j]; u[j] = c[j];
free_dvector(c, 0, n);
} }
/* --------------------------------------------------------------------------*/ /* --------------------------------------------------------------------------*/
@@ -773,8 +871,8 @@ void TwoPunctures::fourft(double *u, int N, int inv)
double x, x1, fac, Pi_fac, *a, *b; double x, x1, fac, Pi_fac, *a, *b;
M = N / 2; M = N / 2;
a = dvector(0, M); a = ws_four_a;
b = dvector(1, M); /* Actually: b=vector(1,M-1) but this is problematic if M=1*/ b = ws_four_b - 1; /* offset to match dvector(1,M) indexing */
fac = 1. / M; fac = 1. / M;
Pi_fac = Pi * fac; Pi_fac = Pi * fac;
if (inv == 0) if (inv == 0)
@@ -823,8 +921,6 @@ void TwoPunctures::fourft(double *u, int N, int inv)
iy = -iy; iy = -iy;
} }
} }
free_dvector(a, 0, M);
free_dvector(b, 1, M);
} }
/* -----------------------------------------*/ /* -----------------------------------------*/
@@ -891,25 +987,17 @@ double TwoPunctures::norm1(double *v, int n)
/* -------------------------------------------------------------------------*/ /* -------------------------------------------------------------------------*/
double TwoPunctures::norm2(double *v, int n) double TwoPunctures::norm2(double *v, int n)
{ {
int i; // Optimized with oneMKL BLAS DNRM2
double result = 0; // Computes: sqrt(sum(v[i]^2))
return cblas_dnrm2(n, v, 1);
for (i = 0; i < n; i++)
result += v[i] * v[i];
return sqrt(result);
} }
/* -------------------------------------------------------------------------*/ /* -------------------------------------------------------------------------*/
double TwoPunctures::scalarproduct(double *v, double *w, int n) double TwoPunctures::scalarproduct(double *v, double *w, int n)
{ {
int i; // Optimized with oneMKL BLAS DDOT
double result = 0; // Computes: sum(v[i] * w[i])
return cblas_ddot(n, v, 1, w, 1);
for (i = 0; i < n; i++)
result += v[i] * w[i];
return result;
} }
/* -------------------------------------------------------------------------*/ /* -------------------------------------------------------------------------*/
@@ -1125,14 +1213,14 @@ void TwoPunctures::Derivatives_AB3(int nvar, int n1, int n2, int n3, derivs v)
double *p, *dp, *d2p, *q, *dq, *r, *dr; double *p, *dp, *d2p, *q, *dq, *r, *dr;
N = maximum3(n1, n2, n3); N = maximum3(n1, n2, n3);
p = dvector(0, N); p = ws_deriv_p;
dp = dvector(0, N); dp = ws_deriv_dp;
d2p = dvector(0, N); d2p = ws_deriv_d2p;
q = dvector(0, N); q = ws_deriv_q;
dq = dvector(0, N); dq = ws_deriv_dq;
r = dvector(0, N); r = ws_deriv_r;
dr = dvector(0, N); dr = ws_deriv_dr;
indx = ivector(0, N); indx = ws_deriv_indx;
for (ivar = 0; ivar < nvar; ivar++) for (ivar = 0; ivar < nvar; ivar++)
{ {
@@ -1215,14 +1303,6 @@ void TwoPunctures::Derivatives_AB3(int nvar, int n1, int n2, int n3, derivs v)
} }
} }
} }
free_dvector(p, 0, N);
free_dvector(dp, 0, N);
free_dvector(d2p, 0, N);
free_dvector(q, 0, N);
free_dvector(dq, 0, N);
free_dvector(r, 0, N);
free_dvector(dr, 0, N);
free_ivector(indx, 0, N);
} }
/* --------------------------------------------------------------------------*/ /* --------------------------------------------------------------------------*/
void TwoPunctures::Newton(int const nvar, int const n1, int const n2, int const n3, void TwoPunctures::Newton(int const nvar, int const n1, int const n2, int const n3,
@@ -1291,10 +1371,11 @@ void TwoPunctures::F_of_v(int nvar, int n1, int n2, int n3, derivs v, double *F,
derivs U; derivs U;
double *sources; double *sources;
values = dvector(0, nvar - 1); values = ws_fov_values;
allocate_derivs(&U, nvar); U = ws_fov_U;
sources = (double *)calloc(n1 * n2 * n3, sizeof(double)); sources = ws_fov_sources;
memset(sources, 0, n1 * n2 * n3 * sizeof(double));
if (0) if (0)
{ {
double *s_x, *s_y, *s_z; double *s_x, *s_y, *s_z;
@@ -1449,9 +1530,6 @@ void TwoPunctures::F_of_v(int nvar, int n1, int n2, int n3, derivs v, double *F,
{ {
fclose(debugfile); fclose(debugfile);
} }
free(sources);
free_dvector(values, 0, nvar - 1);
free_derivs(&U, nvar);
} }
/* --------------------------------------------------------------------------*/ /* --------------------------------------------------------------------------*/
double TwoPunctures::norm_inf(double const *F, int const ntotal) double TwoPunctures::norm_inf(double const *F, int const ntotal)
@@ -1857,11 +1935,12 @@ void TwoPunctures::J_times_dv(int nvar, int n1, int n2, int n3, derivs dv, doubl
Derivatives_AB3(nvar, n1, n2, n3, dv); Derivatives_AB3(nvar, n1, n2, n3, dv);
values = ws_jtdv_values;
dU = ws_jtdv_dU;
U = ws_jtdv_U;
for (i = 0; i < n1; i++) for (i = 0; i < n1; i++)
{ {
values = dvector(0, nvar - 1);
allocate_derivs(&dU, nvar);
allocate_derivs(&U, nvar);
for (j = 0; j < n2; j++) for (j = 0; j < n2; j++)
{ {
for (k = 0; k < n3; k++) for (k = 0; k < n3; k++)
@@ -1915,9 +1994,6 @@ void TwoPunctures::J_times_dv(int nvar, int n1, int n2, int n3, derivs dv, doubl
} }
} }
} }
free_dvector(values, 0, nvar - 1);
free_derivs(&dU, nvar);
free_derivs(&U, nvar);
} }
} }
/* --------------------------------------------------------------------------*/ /* --------------------------------------------------------------------------*/
@@ -1964,17 +2040,11 @@ void TwoPunctures::LineRelax_be(double *dv,
{ {
int j, m, Ic, Ip, Im, col, ivar; int j, m, Ic, Ip, Im, col, ivar;
double *diag = new double[n2]; double *diag = ws_diag_be;
double *e = new double[n2 - 1]; /* above diagonal */ double *e = ws_e_be; /* above diagonal */
double *f = new double[n2 - 1]; /* below diagonal */ double *f = ws_f_be; /* below diagonal */
double *b = new double[n2]; /* rhs */ double *b = ws_b_be; /* rhs */
double *x = new double[n2]; /* solution vector */ double *x = ws_x_be; /* solution vector */
// gsl_vector *diag = gsl_vector_alloc(n2);
// gsl_vector *e = gsl_vector_alloc(n2-1); /* above diagonal */
// gsl_vector *f = gsl_vector_alloc(n2-1); /* below diagonal */
// gsl_vector *b = gsl_vector_alloc(n2); /* rhs */
// gsl_vector *x = gsl_vector_alloc(n2); /* solution vector */
for (ivar = 0; ivar < nvar; ivar++) for (ivar = 0; ivar < nvar; ivar++)
{ {
@@ -1984,62 +2054,35 @@ void TwoPunctures::LineRelax_be(double *dv,
} }
diag[n2 - 1] = 0; diag[n2 - 1] = 0;
// gsl_vector_set_zero(diag);
// gsl_vector_set_zero(e);
// gsl_vector_set_zero(f);
for (j = 0; j < n2; j++) for (j = 0; j < n2; j++)
{ {
Ip = Index(ivar, i, j + 1, k, nvar, n1, n2, n3); Ip = Index(ivar, i, j + 1, k, nvar, n1, n2, n3);
Ic = Index(ivar, i, j, k, nvar, n1, n2, n3); Ic = Index(ivar, i, j, k, nvar, n1, n2, n3);
Im = Index(ivar, i, j - 1, k, nvar, n1, n2, n3); Im = Index(ivar, i, j - 1, k, nvar, n1, n2, n3);
b[j] = rhs[Ic]; b[j] = rhs[Ic];
// gsl_vector_set(b,j,rhs[Ic]);
for (m = 0; m < ncols[Ic]; m++) for (m = 0; m < ncols[Ic]; m++)
{ {
col = cols[Ic][m]; col = cols[Ic][m];
if (col != Ip && col != Ic && col != Im) if (col != Ip && col != Ic && col != Im)
b[j] -= JFD[Ic][m] * dv[col]; b[j] -= JFD[Ic][m] * dv[col];
// *gsl_vector_ptr(b, j) -= JFD[Ic][m] * dv[col];
else else
{ {
if (col == Im && j > 0) if (col == Im && j > 0)
f[j - 1] = JFD[Ic][m]; f[j - 1] = JFD[Ic][m];
// gsl_vector_set(f,j-1,JFD[Ic][m]);
if (col == Ic) if (col == Ic)
diag[j] = JFD[Ic][m]; diag[j] = JFD[Ic][m];
// gsl_vector_set(diag,j,JFD[Ic][m]);
if (col == Ip && j < n2 - 1) if (col == Ip && j < n2 - 1)
e[j] = JFD[Ic][m]; e[j] = JFD[Ic][m];
// gsl_vector_set(e,j,JFD[Ic][m]);
} }
} }
} }
// A x = b
// A = ( d_0 e_0 0 0 )
// ( f_0 d_1 e_1 0 )
// ( 0 f_1 d_2 e_2 )
// ( 0 0 f_2 d_3 )
//
ThomasAlgorithm(n2, f, diag, e, x, b); ThomasAlgorithm(n2, f, diag, e, x, b);
// gsl_linalg_solve_tridiag(diag, e, f, b, x);
for (j = 0; j < n2; j++) for (j = 0; j < n2; j++)
{ {
Ic = Index(ivar, i, j, k, nvar, n1, n2, n3); Ic = Index(ivar, i, j, k, nvar, n1, n2, n3);
dv[Ic] = x[j]; dv[Ic] = x[j];
// dv[Ic] = gsl_vector_get(x, j);
} }
} }
delete[] diag;
delete[] e;
delete[] f;
delete[] b;
delete[] x;
// gsl_vector_free(diag);
// gsl_vector_free(e);
// gsl_vector_free(f);
// gsl_vector_free(b);
// gsl_vector_free(x);
} }
/* --------------------------------------------------------------------------*/ /* --------------------------------------------------------------------------*/
void TwoPunctures::JFD_times_dv(int i, int j, int k, int nvar, int n1, int n2, void TwoPunctures::JFD_times_dv(int i, int j, int k, int nvar, int n1, int n2,
@@ -2056,8 +2099,8 @@ void TwoPunctures::JFD_times_dv(int i, int j, int k, int nvar, int n1, int n2,
ha, ga, ga2, hb, gb, gb2, hp, gp, gp2, gagb, gagp, gbgp; ha, ga, ga2, hb, gb, gb2, hp, gp, gp2, gagb, gagp, gbgp;
derivs dU, U; derivs dU, U;
allocate_derivs(&dU, nvar); dU = ws_jfd_dU;
allocate_derivs(&U, nvar); U = ws_jfd_U;
if (k < 0) if (k < 0)
k = k + n3; k = k + n3;
@@ -2175,9 +2218,6 @@ void TwoPunctures::JFD_times_dv(int i, int j, int k, int nvar, int n1, int n2,
LinEquations(A, B, X, R, x, r, phi, y, z, dU, U, values); LinEquations(A, B, X, R, x, r, phi, y, z, dU, U, values);
for (ivar = 0; ivar < nvar; ivar++) for (ivar = 0; ivar < nvar; ivar++)
values[ivar] *= FAC; values[ivar] *= FAC;
free_derivs(&dU, nvar);
free_derivs(&U, nvar);
} }
#undef FAC #undef FAC
/*-----------------------------------------------------------*/ /*-----------------------------------------------------------*/
@@ -2209,17 +2249,11 @@ void TwoPunctures::LineRelax_al(double *dv,
{ {
int i, m, Ic, Ip, Im, col, ivar; int i, m, Ic, Ip, Im, col, ivar;
double *diag = new double[n1]; double *diag = ws_diag_al;
double *e = new double[n1 - 1]; /* above diagonal */ double *e = ws_e_al; /* above diagonal */
double *f = new double[n1 - 1]; /* below diagonal */ double *f = ws_f_al; /* below diagonal */
double *b = new double[n1]; /* rhs */ double *b = ws_b_al; /* rhs */
double *x = new double[n1]; /* solution vector */ double *x = ws_x_al; /* solution vector */
// gsl_vector *diag = gsl_vector_alloc(n1);
// gsl_vector *e = gsl_vector_alloc(n1-1); /* above diagonal */
// gsl_vector *f = gsl_vector_alloc(n1-1); /* below diagonal */
// gsl_vector *b = gsl_vector_alloc(n1); /* rhs */
// gsl_vector *x = gsl_vector_alloc(n1); /* solution vector */
for (ivar = 0; ivar < nvar; ivar++) for (ivar = 0; ivar < nvar; ivar++)
{ {
@@ -2229,57 +2263,35 @@ void TwoPunctures::LineRelax_al(double *dv,
} }
diag[n1 - 1] = 0; diag[n1 - 1] = 0;
// gsl_vector_set_zero(diag);
// gsl_vector_set_zero(e);
// gsl_vector_set_zero(f);
for (i = 0; i < n1; i++) for (i = 0; i < n1; i++)
{ {
Ip = Index(ivar, i + 1, j, k, nvar, n1, n2, n3); Ip = Index(ivar, i + 1, j, k, nvar, n1, n2, n3);
Ic = Index(ivar, i, j, k, nvar, n1, n2, n3); Ic = Index(ivar, i, j, k, nvar, n1, n2, n3);
Im = Index(ivar, i - 1, j, k, nvar, n1, n2, n3); Im = Index(ivar, i - 1, j, k, nvar, n1, n2, n3);
b[i] = rhs[Ic]; b[i] = rhs[Ic];
// gsl_vector_set(b,i,rhs[Ic]);
for (m = 0; m < ncols[Ic]; m++) for (m = 0; m < ncols[Ic]; m++)
{ {
col = cols[Ic][m]; col = cols[Ic][m];
if (col != Ip && col != Ic && col != Im) if (col != Ip && col != Ic && col != Im)
b[i] -= JFD[Ic][m] * dv[col]; b[i] -= JFD[Ic][m] * dv[col];
// *gsl_vector_ptr(b, i) -= JFD[Ic][m] * dv[col];
else else
{ {
if (col == Im && i > 0) if (col == Im && i > 0)
f[i - 1] = JFD[Ic][m]; f[i - 1] = JFD[Ic][m];
// gsl_vector_set(f,i-1,JFD[Ic][m]);
if (col == Ic) if (col == Ic)
diag[i] = JFD[Ic][m]; diag[i] = JFD[Ic][m];
// gsl_vector_set(diag,i,JFD[Ic][m]);
if (col == Ip && i < n1 - 1) if (col == Ip && i < n1 - 1)
e[i] = JFD[Ic][m]; e[i] = JFD[Ic][m];
// gsl_vector_set(e,i,JFD[Ic][m]);
} }
} }
} }
ThomasAlgorithm(n1, f, diag, e, x, b); ThomasAlgorithm(n1, f, diag, e, x, b);
// gsl_linalg_solve_tridiag(diag, e, f, b, x);
for (i = 0; i < n1; i++) for (i = 0; i < n1; i++)
{ {
Ic = Index(ivar, i, j, k, nvar, n1, n2, n3); Ic = Index(ivar, i, j, k, nvar, n1, n2, n3);
dv[Ic] = x[i]; dv[Ic] = x[i];
// dv[Ic] = gsl_vector_get(x, i);
} }
} }
delete[] diag;
delete[] e;
delete[] f;
delete[] b;
delete[] x;
// gsl_vector_free(diag);
// gsl_vector_free(e);
// gsl_vector_free(f);
// gsl_vector_free(b);
// gsl_vector_free(x);
} }
/* -------------------------------------------------------------------------*/ /* -------------------------------------------------------------------------*/
// a[N], b[N-1], c[N-1], x[N], q[N] // a[N], b[N-1], c[N-1], x[N], q[N]
@@ -2291,44 +2303,29 @@ void TwoPunctures::LineRelax_al(double *dv,
//"Parallel Scientific Computing in C++ and MPI" P361 //"Parallel Scientific Computing in C++ and MPI" P361
void TwoPunctures::ThomasAlgorithm(int N, double *b, double *a, double *c, double *x, double *q) void TwoPunctures::ThomasAlgorithm(int N, double *b, double *a, double *c, double *x, double *q)
{ {
// In-place Thomas algorithm: uses a[] as d workspace, b[] as l workspace.
// c[] is already u (above-diagonal). ws_thomas_y is pre-allocated workspace.
int i; int i;
double *l, *u, *d, *y; double *y = ws_thomas_y;
l = new double[N - 1];
u = new double[N - 1];
d = new double[N];
y = new double[N];
/* LU Decomposition */
d[0] = a[0];
u[0] = c[0];
/* LU Decomposition (in-place: a becomes d, b becomes l) */
for (i = 0; i < N - 2; i++) for (i = 0; i < N - 2; i++)
{ {
l[i] = b[i] / d[i]; b[i] = b[i] / a[i];
d[i + 1] = a[i + 1] - l[i] * u[i]; a[i + 1] = a[i + 1] - b[i] * c[i];
u[i + 1] = c[i + 1];
} }
b[N - 2] = b[N - 2] / a[N - 2];
l[N - 2] = b[N - 2] / d[N - 2]; a[N - 1] = a[N - 1] - b[N - 2] * c[N - 2];
d[N - 1] = a[N - 1] - l[N - 2] * u[N - 2];
/* Forward Substitution [L][y] = [q] */ /* Forward Substitution [L][y] = [q] */
y[0] = q[0]; y[0] = q[0];
for (i = 1; i < N; i++) for (i = 1; i < N; i++)
y[i] = q[i] - l[i - 1] * y[i - 1]; y[i] = q[i] - b[i - 1] * y[i - 1];
/* Backward Substitution [U][x] = [y] */ /* Backward Substitution [U][x] = [y] */
x[N - 1] = y[N - 1] / d[N - 1]; x[N - 1] = y[N - 1] / a[N - 1];
for (i = N - 2; i >= 0; i--) for (i = N - 2; i >= 0; i--)
x[i] = (y[i] - u[i] * x[i + 1]) / d[i]; x[i] = (y[i] - c[i] * x[i + 1]) / a[i];
delete[] l;
delete[] u;
delete[] d;
delete[] y;
return;
} }
// --------------------------------------------------------------------------*/ // --------------------------------------------------------------------------*/
// Calculates the value of v at an arbitrary position (x,y,z) if the spectral coefficients are know*/*/ // Calculates the value of v at an arbitrary position (x,y,z) if the spectral coefficients are know*/*/

View File

@@ -42,6 +42,33 @@ private:
int ntotal; int ntotal;
// Pre-allocated workspace buffers for hot-path allocation elimination
// LineRelax_be workspace (sized for n2)
double *ws_diag_be, *ws_e_be, *ws_f_be, *ws_b_be, *ws_x_be;
// LineRelax_al workspace (sized for n1)
double *ws_diag_al, *ws_e_al, *ws_f_al, *ws_b_al, *ws_x_al;
// ThomasAlgorithm workspace (sized for max(n1,n2))
double *ws_thomas_y;
// JFD_times_dv workspace (sized for nvar)
double *ws_jfd_values;
derivs ws_jfd_dU, ws_jfd_U;
// chebft_Zeros workspace (sized for max(n1,n2,n3)+1)
double *ws_cheb_c;
// fourft workspace (sized for max(n1,n2,n3)/2+1 each)
double *ws_four_a, *ws_four_b;
// Derivatives_AB3 workspace
double *ws_deriv_p, *ws_deriv_dp, *ws_deriv_d2p;
double *ws_deriv_q, *ws_deriv_dq;
double *ws_deriv_r, *ws_deriv_dr;
int *ws_deriv_indx;
// F_of_v workspace
double *ws_fov_sources;
double *ws_fov_values;
derivs ws_fov_U;
// J_times_dv workspace
double *ws_jtdv_values;
derivs ws_jtdv_dU, ws_jtdv_U;
struct parameters struct parameters
{ {
int nvar, n1, n2, n3; int nvar, n1, n2, n3;

View File

@@ -83,6 +83,10 @@
real*8, dimension(ex(1),ex(2),ex(3)) :: gupxx,gupxy,gupxz real*8, dimension(ex(1),ex(2),ex(3)) :: gupxx,gupxy,gupxz
real*8, dimension(ex(1),ex(2),ex(3)) :: gupyy,gupyz,gupzz real*8, dimension(ex(1),ex(2),ex(3)) :: gupyy,gupyz,gupzz
! shared work arrays (memory pool) to avoid repeated allocation in subroutines
real*8, dimension(-1:ex(1),-1:ex(2),-1:ex(3)) :: fh_work2 ! for fderivs_fh/fdderivs_fh (ghost_width=2)
real*8, dimension(-2:ex(1),-2:ex(2),-2:ex(3)) :: fh_work3 ! for kodis_fh/lopsided_fh (ghost_width=3)
real*8,dimension(3) ::SSS,AAS,ASA,SAA,ASS,SAS,SSA real*8,dimension(3) ::SSS,AAS,ASA,SAA,ASS,SAS,SSA
real*8 :: dX, dY, dZ, PI real*8 :: dX, dY, dZ, PI
real*8, parameter :: ZEO = 0.d0,ONE = 1.D0, TWO = 2.D0, FOUR = 4.D0 real*8, parameter :: ZEO = 0.d0,ONE = 1.D0, TWO = 2.D0, FOUR = 4.D0
@@ -106,7 +110,8 @@
call getpbh(BHN,Porg,Mass) call getpbh(BHN,Porg,Mass)
#endif #endif
!!! sanity check !!! sanity check (disabled in production builds for performance)
#ifdef DEBUG
dX = sum(chi)+sum(trK)+sum(dxx)+sum(gxy)+sum(gxz)+sum(dyy)+sum(gyz)+sum(dzz) & dX = sum(chi)+sum(trK)+sum(dxx)+sum(gxy)+sum(gxz)+sum(dyy)+sum(gyz)+sum(dzz) &
+sum(Axx)+sum(Axy)+sum(Axz)+sum(Ayy)+sum(Ayz)+sum(Azz) & +sum(Axx)+sum(Axy)+sum(Axz)+sum(Ayy)+sum(Ayz)+sum(Azz) &
+sum(Gamx)+sum(Gamy)+sum(Gamz) & +sum(Gamx)+sum(Gamy)+sum(Gamz) &
@@ -136,6 +141,7 @@
gont = 1 gont = 1
return return
endif endif
#endif
PI = dacos(-ONE) PI = dacos(-ONE)
@@ -149,22 +155,22 @@
gyy = dyy + ONE gyy = dyy + ONE
gzz = dzz + ONE gzz = dzz + ONE
call fderivs(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev) call fderivs_fh(ex,betax,betaxx,betaxy,betaxz,X,Y,Z,ANTI, SYM, SYM,Symmetry,Lev,fh_work2)
call fderivs(ex,betay,betayx,betayy,betayz,X,Y,Z, SYM,ANTI, SYM,Symmetry,Lev) call fderivs_fh(ex,betay,betayx,betayy,betayz,X,Y,Z, SYM,ANTI, SYM,Symmetry,Lev,fh_work2)
call fderivs(ex,betaz,betazx,betazy,betazz,X,Y,Z, SYM, SYM,ANTI,Symmetry,Lev) call fderivs_fh(ex,betaz,betazx,betazy,betazz,X,Y,Z, SYM, SYM,ANTI,Symmetry,Lev,fh_work2)
div_beta = betaxx + betayy + betazz div_beta = betaxx + betayy + betazz
call fderivs(ex,chi,chix,chiy,chiz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev) call fderivs_fh(ex,chi,chix,chiy,chiz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev,fh_work2)
chi_rhs = F2o3 *chin1*( alpn1 * trK - div_beta ) !rhs for chi chi_rhs = F2o3 *chin1*( alpn1 * trK - div_beta ) !rhs for chi
call fderivs(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs_fh(ex,dxx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev,fh_work2)
call fderivs(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev) call fderivs_fh(ex,gxy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,Lev,fh_work2)
call fderivs(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev) call fderivs_fh(ex,gxz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,Lev,fh_work2)
call fderivs(ex,dyy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs_fh(ex,dyy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev,fh_work2)
call fderivs(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev) call fderivs_fh(ex,gyz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,Lev,fh_work2)
call fderivs(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev) call fderivs_fh(ex,dzz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,Lev,fh_work2)
gxx_rhs = - TWO * alpn1 * Axx - F2o3 * gxx * div_beta + & gxx_rhs = - TWO * alpn1 * Axx - F2o3 * gxx * div_beta + &
TWO *( gxx * betaxx + gxy * betayx + gxz * betazx) TWO *( gxx * betaxx + gxy * betayx + gxz * betazx)
@@ -282,8 +288,8 @@
(gupyy * gupzz + gupyz * gupyz)* Ayz (gupyy * gupzz + gupyz * gupyz)* Ayz
! Right hand side for Gam^i without shift terms... ! Right hand side for Gam^i without shift terms...
call fderivs(ex,Lap,Lapx,Lapy,Lapz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fderivs_fh(ex,Lap,Lapx,Lapy,Lapz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev,fh_work2)
call fderivs(ex,trK,Kx,Ky,Kz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev) call fderivs_fh(ex,trK,Kx,Ky,Kz,X,Y,Z,SYM,SYM,SYM,symmetry,Lev,fh_work2)
Gamx_rhs = - TWO * ( Lapx * Rxx + Lapy * Rxy + Lapz * Rxz ) + & Gamx_rhs = - TWO * ( Lapx * Rxx + Lapy * Rxy + Lapz * Rxz ) + &
TWO * alpn1 * ( & TWO * alpn1 * ( &
@@ -312,12 +318,12 @@
Gamzxx * Rxx + Gamzyy * Ryy + Gamzzz * Rzz + & Gamzxx * Rxx + Gamzyy * Ryy + Gamzzz * Rzz + &
TWO * ( Gamzxy * Rxy + Gamzxz * Rxz + Gamzyz * Ryz ) ) TWO * ( Gamzxy * Rxy + Gamzxz * Rxz + Gamzyz * Ryz ) )
call fdderivs(ex,betax,gxxx,gxyx,gxzx,gyyx,gyzx,gzzx,& call fdderivs_fh(ex,betax,gxxx,gxyx,gxzx,gyyx,gyzx,gzzx,&
X,Y,Z,ANTI,SYM, SYM ,Symmetry,Lev) X,Y,Z,ANTI,SYM, SYM ,Symmetry,Lev,fh_work2)
call fdderivs(ex,betay,gxxy,gxyy,gxzy,gyyy,gyzy,gzzy,& call fdderivs_fh(ex,betay,gxxy,gxyy,gxzy,gyyy,gyzy,gzzy,&
X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev) X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev,fh_work2)
call fdderivs(ex,betaz,gxxz,gxyz,gxzz,gyyz,gyzz,gzzz,& call fdderivs_fh(ex,betaz,gxxz,gxyz,gxzz,gyyz,gyzz,gzzz,&
X,Y,Z,SYM ,SYM, ANTI,Symmetry,Lev) X,Y,Z,SYM ,SYM, ANTI,Symmetry,Lev,fh_work2)
fxx = gxxx + gxyy + gxzz fxx = gxxx + gxyy + gxzz
fxy = gxyx + gyyy + gyzz fxy = gxyx + gyyy + gyzz
@@ -330,9 +336,9 @@
Gamza = gupxx * Gamzxx + gupyy * Gamzyy + gupzz * Gamzzz + & Gamza = gupxx * Gamzxx + gupyy * Gamzyy + gupzz * Gamzzz + &
TWO*( gupxy * Gamzxy + gupxz * Gamzxz + gupyz * Gamzyz ) TWO*( gupxy * Gamzxy + gupxz * Gamzxz + gupyz * Gamzyz )
call fderivs(ex,Gamx,Gamxx,Gamxy,Gamxz,X,Y,Z,ANTI,SYM ,SYM ,Symmetry,Lev) call fderivs_fh(ex,Gamx,Gamxx,Gamxy,Gamxz,X,Y,Z,ANTI,SYM ,SYM ,Symmetry,Lev,fh_work2)
call fderivs(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev) call fderivs_fh(ex,Gamy,Gamyx,Gamyy,Gamyz,X,Y,Z,SYM ,ANTI,SYM ,Symmetry,Lev,fh_work2)
call fderivs(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev) call fderivs_fh(ex,Gamz,Gamzx,Gamzy,Gamzz,X,Y,Z,SYM ,SYM ,ANTI,Symmetry,Lev,fh_work2)
Gamx_rhs = Gamx_rhs + F2o3 * Gamxa * div_beta - & Gamx_rhs = Gamx_rhs + F2o3 * Gamxa * div_beta - &
Gamxa * betaxx - Gamya * betaxy - Gamza * betaxz + & Gamxa * betaxx - Gamya * betaxy - Gamza * betaxz + &
@@ -375,27 +381,27 @@
gzzz = gxz * Gamxzz + gyz * Gamyzz + gzz * Gamzzz gzzz = gxz * Gamxzz + gyz * Gamyzz + gzz * Gamzzz
!compute Ricci tensor for tilted metric !compute Ricci tensor for tilted metric
call fdderivs(ex,dxx,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,symmetry,Lev) call fdderivs_fh(ex,dxx,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,symmetry,Lev,fh_work2)
Rxx = gupxx * fxx + gupyy * fyy + gupzz * fzz + & Rxx = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO
call fdderivs(ex,dyy,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,symmetry,Lev) call fdderivs_fh(ex,dyy,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,symmetry,Lev,fh_work2)
Ryy = gupxx * fxx + gupyy * fyy + gupzz * fzz + & Ryy = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO
call fdderivs(ex,dzz,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,symmetry,Lev) call fdderivs_fh(ex,dzz,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,SYM ,SYM ,symmetry,Lev,fh_work2)
Rzz = gupxx * fxx + gupyy * fyy + gupzz * fzz + & Rzz = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO
call fdderivs(ex,gxy,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,ANTI, ANTI,SYM ,symmetry,Lev) call fdderivs_fh(ex,gxy,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,ANTI, ANTI,SYM ,symmetry,Lev,fh_work2)
Rxy = gupxx * fxx + gupyy * fyy + gupzz * fzz + & Rxy = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO
call fdderivs(ex,gxz,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,ANTI ,SYM ,ANTI,symmetry,Lev) call fdderivs_fh(ex,gxz,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,ANTI ,SYM ,ANTI,symmetry,Lev,fh_work2)
Rxz = gupxx * fxx + gupyy * fyy + gupzz * fzz + & Rxz = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO
call fdderivs(ex,gyz,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,ANTI ,ANTI,symmetry,Lev) call fdderivs_fh(ex,gyz,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM ,ANTI ,ANTI,symmetry,Lev,fh_work2)
Ryz = gupxx * fxx + gupyy * fyy + gupzz * fzz + & Ryz = gupxx * fxx + gupyy * fyy + gupzz * fzz + &
( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO ( gupxy * fxy + gupxz * fxz + gupyz * fyz ) * TWO
@@ -600,7 +606,7 @@
Gamxzz * gxzy + Gamyzz * gyzy + Gamzzz * gzzy + & Gamxzz * gxzy + Gamyzz * gyzy + Gamzzz * gzzy + &
Gamxyz * gzzx + Gamyyz * gzzy + Gamzyz * gzzz ) Gamxyz * gzzx + Gamyyz * gzzy + Gamzyz * gzzz )
!covariant second derivative of chi respect to tilted metric !covariant second derivative of chi respect to tilted metric
call fdderivs(ex,chi,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fdderivs_fh(ex,chi,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev,fh_work2)
fxx = fxx - Gamxxx * chix - Gamyxx * chiy - Gamzxx * chiz fxx = fxx - Gamxxx * chix - Gamyxx * chiy - Gamzxx * chiz
fxy = fxy - Gamxxy * chix - Gamyxy * chiy - Gamzxy * chiz fxy = fxy - Gamxxy * chix - Gamyxy * chiy - Gamzxy * chiz
@@ -626,8 +632,8 @@
Ryz = Ryz + (fyz - chiy*chiz/chin1/TWO + gyz * f)/chin1/TWO Ryz = Ryz + (fyz - chiy*chiz/chin1/TWO + gyz * f)/chin1/TWO
! covariant second derivatives of the lapse respect to physical metric ! covariant second derivatives of the lapse respect to physical metric
call fdderivs(ex,Lap,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, & call fdderivs_fh(ex,Lap,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, &
SYM,SYM,SYM,symmetry,Lev) SYM,SYM,SYM,symmetry,Lev,fh_work2)
gxxx = (gupxx * chix + gupxy * chiy + gupxz * chiz)/chin1 gxxx = (gupxx * chix + gupxy * chiy + gupxz * chiz)/chin1
gxxy = (gupxy * chix + gupyy * chiy + gupyz * chiz)/chin1 gxxy = (gupxy * chix + gupyy * chiy + gupyz * chiz)/chin1
@@ -810,7 +816,7 @@
betay_rhs = FF*dtSfy betay_rhs = FF*dtSfy
betaz_rhs = FF*dtSfz betaz_rhs = FF*dtSfz
call fderivs(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fderivs_fh(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev,fh_work2)
reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + & reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + &
TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs) TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs)
reta = 1.31d0/2*dsqrt(reta/chin1)/(1-dsqrt(chin1))**2 reta = 1.31d0/2*dsqrt(reta/chin1)/(1-dsqrt(chin1))**2
@@ -822,7 +828,7 @@
betay_rhs = FF*dtSfy betay_rhs = FF*dtSfy
betaz_rhs = FF*dtSfz betaz_rhs = FF*dtSfz
call fderivs(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fderivs_fh(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev,fh_work2)
reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + & reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + &
TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs) TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs)
reta = 1.31d0/2*dsqrt(reta/chin1)/(1-chin1)**2 reta = 1.31d0/2*dsqrt(reta/chin1)/(1-chin1)**2
@@ -830,7 +836,7 @@
dtSfy_rhs = Gamy_rhs - reta*dtSfy dtSfy_rhs = Gamy_rhs - reta*dtSfy
dtSfz_rhs = Gamz_rhs - reta*dtSfz dtSfz_rhs = Gamz_rhs - reta*dtSfz
#elif (GAUGE == 4) #elif (GAUGE == 4)
call fderivs(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fderivs_fh(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev,fh_work2)
reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + & reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + &
TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs) TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs)
reta = 1.31d0/2*dsqrt(reta/chin1)/(1-dsqrt(chin1))**2 reta = 1.31d0/2*dsqrt(reta/chin1)/(1-dsqrt(chin1))**2
@@ -842,7 +848,7 @@
dtSfy_rhs = ZEO dtSfy_rhs = ZEO
dtSfz_rhs = ZEO dtSfz_rhs = ZEO
#elif (GAUGE == 5) #elif (GAUGE == 5)
call fderivs(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev) call fderivs_fh(ex,chi,dtSfx_rhs,dtSfy_rhs,dtSfz_rhs,X,Y,Z,SYM,SYM,SYM,Symmetry,Lev,fh_work2)
reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + & reta = gupxx * dtSfx_rhs * dtSfx_rhs + gupyy * dtSfy_rhs * dtSfy_rhs + gupzz * dtSfz_rhs * dtSfz_rhs + &
TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs) TWO * (gupxy * dtSfx_rhs * dtSfy_rhs + gupxz * dtSfx_rhs * dtSfz_rhs + gupyz * dtSfy_rhs * dtSfz_rhs)
reta = 1.31d0/2*dsqrt(reta/chin1)/(1-chin1)**2 reta = 1.31d0/2*dsqrt(reta/chin1)/(1-chin1)**2
@@ -945,51 +951,51 @@
!!!!!!!!!advection term part !!!!!!!!!advection term part
call lopsided(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,gxx,gxx_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS) call lopsided_fh(ex,X,Y,Z,gxy,gxy_rhs,betax,betay,betaz,Symmetry,AAS,fh_work3)
call lopsided(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA) call lopsided_fh(ex,X,Y,Z,gxz,gxz_rhs,betax,betay,betaz,Symmetry,ASA,fh_work3)
call lopsided(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,gyy,gyy_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA) call lopsided_fh(ex,X,Y,Z,gyz,gyz_rhs,betax,betay,betaz,Symmetry,SAA,fh_work3)
call lopsided(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,gzz,gzz_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,Axx,Axx_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS) call lopsided_fh(ex,X,Y,Z,Axy,Axy_rhs,betax,betay,betaz,Symmetry,AAS,fh_work3)
call lopsided(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA) call lopsided_fh(ex,X,Y,Z,Axz,Axz_rhs,betax,betay,betaz,Symmetry,ASA,fh_work3)
call lopsided(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,Ayy,Ayy_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA) call lopsided_fh(ex,X,Y,Z,Ayz,Ayz_rhs,betax,betay,betaz,Symmetry,SAA,fh_work3)
call lopsided(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,Azz,Azz_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,chi,chi_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,trK,trK_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
call lopsided(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided_fh(ex,X,Y,Z,Gamx,Gamx_rhs,betax,betay,betaz,Symmetry,ASS,fh_work3)
call lopsided(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided_fh(ex,X,Y,Z,Gamy,Gamy_rhs,betax,betay,betaz,Symmetry,SAS,fh_work3)
call lopsided(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided_fh(ex,X,Y,Z,Gamz,Gamz_rhs,betax,betay,betaz,Symmetry,SSA,fh_work3)
!! !!
call lopsided(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS) call lopsided_fh(ex,X,Y,Z,Lap,Lap_rhs,betax,betay,betaz,Symmetry,SSS,fh_work3)
#if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7) #if (GAUGE == 0 || GAUGE == 1 || GAUGE == 2 || GAUGE == 3 || GAUGE == 4 || GAUGE == 5 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided_fh(ex,X,Y,Z,betax,betax_rhs,betax,betay,betaz,Symmetry,ASS,fh_work3)
call lopsided(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided_fh(ex,X,Y,Z,betay,betay_rhs,betax,betay,betaz,Symmetry,SAS,fh_work3)
call lopsided(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided_fh(ex,X,Y,Z,betaz,betaz_rhs,betax,betay,betaz,Symmetry,SSA,fh_work3)
#endif #endif
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7) #if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call lopsided(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS) call lopsided_fh(ex,X,Y,Z,dtSfx,dtSfx_rhs,betax,betay,betaz,Symmetry,ASS,fh_work3)
call lopsided(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS) call lopsided_fh(ex,X,Y,Z,dtSfy,dtSfy_rhs,betax,betay,betaz,Symmetry,SAS,fh_work3)
call lopsided(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA) call lopsided_fh(ex,X,Y,Z,dtSfz,dtSfz_rhs,betax,betay,betaz,Symmetry,SSA,fh_work3)
#endif #endif
if(eps>0)then if(eps>0)then
! usual Kreiss-Oliger dissipation ! usual Kreiss-Oliger dissipation
call kodis(ex,X,Y,Z,chi,chi_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,chi,chi_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,trK,trK_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,trK,trK_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,dxx,gxx_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,dxx,gxx_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,gxy,gxy_rhs,AAS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,gxy,gxy_rhs,AAS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,gxz,gxz_rhs,ASA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,gxz,gxz_rhs,ASA,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,dyy,gyy_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,dyy,gyy_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,gyz,gyz_rhs,SAA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,gyz,gyz_rhs,SAA,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,dzz,gzz_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,dzz,gzz_rhs,SSS,Symmetry,eps,fh_work3)
#if 0 #if 0
#define i 42 #define i 42
#define j 40 #define j 40
@@ -1003,7 +1009,7 @@ endif
#undef k #undef k
!!stop !!stop
#endif #endif
call kodis(ex,X,Y,Z,Axx,Axx_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Axx,Axx_rhs,SSS,Symmetry,eps,fh_work3)
#if 0 #if 0
#define i 42 #define i 42
#define j 40 #define j 40
@@ -1017,25 +1023,25 @@ endif
#undef k #undef k
!!stop !!stop
#endif #endif
call kodis(ex,X,Y,Z,Axy,Axy_rhs,AAS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Axy,Axy_rhs,AAS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Axz,Axz_rhs,ASA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Axz,Axz_rhs,ASA,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Ayy,Ayy_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Ayy,Ayy_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Ayz,Ayz_rhs,SAA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Ayz,Ayz_rhs,SAA,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Azz,Azz_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Azz,Azz_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Gamx,Gamx_rhs,ASS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Gamx,Gamx_rhs,ASS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Gamy,Gamy_rhs,SAS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Gamy,Gamy_rhs,SAS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,Gamz,Gamz_rhs,SSA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Gamz,Gamz_rhs,SSA,Symmetry,eps,fh_work3)
#if 1 #if 1
!! bam does not apply dissipation on gauge variables !! bam does not apply dissipation on gauge variables
call kodis(ex,X,Y,Z,Lap,Lap_rhs,SSS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,Lap,Lap_rhs,SSS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,betax,betax_rhs,ASS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,betax,betax_rhs,ASS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,betay,betay_rhs,SAS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,betay,betay_rhs,SAS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,betaz,betaz_rhs,SSA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,betaz,betaz_rhs,SSA,Symmetry,eps,fh_work3)
#if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7) #if (GAUGE == 0 || GAUGE == 2 || GAUGE == 3 || GAUGE == 6 || GAUGE == 7)
call kodis(ex,X,Y,Z,dtSfx,dtSfx_rhs,ASS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,dtSfx,dtSfx_rhs,ASS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,dtSfy,dtSfy_rhs,SAS,Symmetry,eps) call kodis_fh(ex,X,Y,Z,dtSfy,dtSfy_rhs,SAS,Symmetry,eps,fh_work3)
call kodis(ex,X,Y,Z,dtSfz,dtSfz_rhs,SSA,Symmetry,eps) call kodis_fh(ex,X,Y,Z,dtSfz,dtSfz_rhs,SSA,Symmetry,eps,fh_work3)
#endif #endif
#endif #endif
@@ -1076,12 +1082,12 @@ endif
! mov_Res_j = gupkj*(-1/chi d_k chi*A_ij + D_k A_ij) - 2/3 d_j trK - 8 PI s_j where D respect to physical metric ! mov_Res_j = gupkj*(-1/chi d_k chi*A_ij + D_k A_ij) - 2/3 d_j trK - 8 PI s_j where D respect to physical metric
! store D_i A_jk - 1/chi d_i chi*A_jk in gjk_i ! store D_i A_jk - 1/chi d_i chi*A_jk in gjk_i
call fderivs(ex,Axx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0) call fderivs_fh(ex,Axx,gxxx,gxxy,gxxz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0,fh_work2)
call fderivs(ex,Axy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,0) call fderivs_fh(ex,Axy,gxyx,gxyy,gxyz,X,Y,Z,ANTI,ANTI,SYM ,Symmetry,0,fh_work2)
call fderivs(ex,Axz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,0) call fderivs_fh(ex,Axz,gxzx,gxzy,gxzz,X,Y,Z,ANTI,SYM ,ANTI,Symmetry,0,fh_work2)
call fderivs(ex,Ayy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0) call fderivs_fh(ex,Ayy,gyyx,gyyy,gyyz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0,fh_work2)
call fderivs(ex,Ayz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,0) call fderivs_fh(ex,Ayz,gyzx,gyzy,gyzz,X,Y,Z,SYM ,ANTI,ANTI,Symmetry,0,fh_work2)
call fderivs(ex,Azz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0) call fderivs_fh(ex,Azz,gzzx,gzzy,gzzz,X,Y,Z,SYM ,SYM ,SYM ,Symmetry,0,fh_work2)
gxxx = gxxx - ( Gamxxx * Axx + Gamyxx * Axy + Gamzxx * Axz & gxxx = gxxx - ( Gamxxx * Axx + Gamyxx * Axy + Gamzxx * Axz &
+ Gamxxx * Axx + Gamyxx * Axy + Gamzxx * Axz) - chix*Axx/chin1 + Gamxxx * Axx + Gamyxx * Axy + Gamzxx * Axz) - chix*Axx/chin1

View File

@@ -1103,6 +1103,103 @@
end subroutine fderivs end subroutine fderivs
!----------------------------------------------------------------------------- !-----------------------------------------------------------------------------
! fderivs variant: reuses caller-provided fh work array (memory pool)
!-----------------------------------------------------------------------------
subroutine fderivs_fh(ex,f,fx,fy,fz,X,Y,Z,SYM1,SYM2,SYM3, &
symmetry,onoff,fh)
implicit none
integer, intent(in ):: ex(1:3),symmetry,onoff
real*8, dimension(ex(1),ex(2),ex(3)), intent(in ):: f
real*8, dimension(ex(1),ex(2),ex(3)), intent(out):: fx,fy,fz
real*8, intent(in) :: X(ex(1)),Y(ex(2)),Z(ex(3))
real*8, intent(in ):: SYM1,SYM2,SYM3
real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)),intent(inout):: fh
real*8 :: dX,dY,dZ
real*8, dimension(3) :: SoA
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8, parameter :: ZEO=0.d0,ONE=1.d0
real*8, parameter :: TWO=2.d0,EIT=8.d0
real*8, parameter :: F12=1.2d1
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
SoA(1) = SYM1
SoA(2) = SYM2
SoA(3) = SYM3
call symmetry_bd(2,ex,f,fh,SoA)
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
fx = ZEO
fy = ZEO
fz = ZEO
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
#if 0
if(i+2 <= imax .and. i-2 >= imin)then
fx(i,j,k)=d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax .and. i-1 >= imin)then
fx(i,j,k)=d2dx*(-fh(i-1,j,k)+fh(i+1,j,k))
endif
if(j+2 <= jmax .and. j-2 >= jmin)then
fy(i,j,k)=d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax .and. j-1 >= jmin)then
fy(i,j,k)=d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
endif
if(k+2 <= kmax .and. k-2 >= kmin)then
fz(i,j,k)=d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax .and. k-1 >= kmin)then
fz(i,j,k)=d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
endif
#else
if(i+2 <= imax .and. i-2 >= imin .and. &
j+2 <= jmax .and. j-2 >= jmin .and. &
k+2 <= kmax .and. k-2 >= kmin) then
fx(i,j,k)=d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
fy(i,j,k)=d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
fz(i,j,k)=d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(i+1 <= imax .and. i-1 >= imin .and. &
j+1 <= jmax .and. j-1 >= jmin .and. &
k+1 <= kmax .and. k-1 >= kmin) then
fx(i,j,k)=d2dx*(-fh(i-1,j,k)+fh(i+1,j,k))
fy(i,j,k)=d2dy*(-fh(i,j-1,k)+fh(i,j+1,k))
fz(i,j,k)=d2dz*(-fh(i,j,k-1)+fh(i,j,k+1))
endif
#endif
enddo
enddo
enddo
return
end subroutine fderivs_fh
!-----------------------------------------------------------------------------
! !
! single derivatives dx ! single derivatives dx
! !
@@ -1940,6 +2037,162 @@
end subroutine fddyz end subroutine fddyz
!-----------------------------------------------------------------------------
! fdderivs variant: reuses caller-provided fh work array (memory pool)
!-----------------------------------------------------------------------------
subroutine fdderivs_fh(ex,f,fxx,fxy,fxz,fyy,fyz,fzz,X,Y,Z, &
SYM1,SYM2,SYM3,symmetry,onoff,fh)
implicit none
integer, intent(in ):: ex(1:3),symmetry,onoff
real*8, dimension(ex(1),ex(2),ex(3)),intent(in ):: f
real*8, dimension(ex(1),ex(2),ex(3)),intent(out):: fxx,fxy,fxz,fyy,fyz,fzz
real*8, intent(in ):: X(ex(1)),Y(ex(2)),Z(ex(3))
real*8, intent(in ):: SYM1,SYM2,SYM3
real*8,dimension(-1:ex(1),-1:ex(2),-1:ex(3)),intent(inout):: fh
real*8 :: dX,dY,dZ
real*8, dimension(3) :: SoA
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: Sdxdx,Sdydy,Sdzdz,Fdxdx,Fdydy,Fdzdz
real*8 :: Sdxdy,Sdxdz,Sdydz,Fdxdy,Fdxdz,Fdydz
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
real*8, parameter :: ZEO=0.d0, ONE=1.d0, TWO=2.d0, F1o4=2.5d-1
real*8, parameter :: F8=8.d0, F16=1.6d1, F30=3.d1
real*8, parameter :: F1o12=ONE/1.2d1, F1o144=ONE/1.44d2
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -1
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -1
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -1
SoA(1) = SYM1
SoA(2) = SYM2
SoA(3) = SYM3
call symmetry_bd(2,ex,f,fh,SoA)
Sdxdx = ONE /( dX * dX )
Sdydy = ONE /( dY * dY )
Sdzdz = ONE /( dZ * dZ )
Fdxdx = F1o12 /( dX * dX )
Fdydy = F1o12 /( dY * dY )
Fdzdz = F1o12 /( dZ * dZ )
Sdxdy = F1o4 /( dX * dY )
Sdxdz = F1o4 /( dX * dZ )
Sdydz = F1o4 /( dY * dZ )
Fdxdy = F1o144 /( dX * dY )
Fdxdz = F1o144 /( dX * dZ )
Fdydz = F1o144 /( dY * dZ )
fxx = ZEO
fyy = ZEO
fzz = ZEO
fxy = ZEO
fxz = ZEO
fyz = ZEO
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
#if 0
if(i+2 <= imax .and. i-2 >= imin)then
fxx(i,j,k) = Fdxdx*(-fh(i-2,j,k)+F16*fh(i-1,j,k)-F30*fh(i,j,k) &
-fh(i+2,j,k)+F16*fh(i+1,j,k) )
elseif(i+1 <= imax .and. i-1 >= imin)then
fxx(i,j,k) = Sdxdx*(fh(i-1,j,k)-TWO*fh(i,j,k)+fh(i+1,j,k))
endif
if(j+2 <= jmax .and. j-2 >= jmin)then
fyy(i,j,k) = Fdydy*(-fh(i,j-2,k)+F16*fh(i,j-1,k)-F30*fh(i,j,k) &
-fh(i,j+2,k)+F16*fh(i,j+1,k) )
elseif(j+1 <= jmax .and. j-1 >= jmin)then
fyy(i,j,k) = Sdydy*(fh(i,j-1,k)-TWO*fh(i,j,k)+fh(i,j+1,k))
endif
if(k+2 <= kmax .and. k-2 >= kmin)then
fzz(i,j,k) = Fdzdz*(-fh(i,j,k-2)+F16*fh(i,j,k-1)-F30*fh(i,j,k) &
-fh(i,j,k+2)+F16*fh(i,j,k+1) )
elseif(k+1 <= kmax .and. k-1 >= kmin)then
fzz(i,j,k) = Sdzdz*(fh(i,j,k-1)-TWO*fh(i,j,k)+fh(i,j,k+1))
endif
if(i+2 <= imax .and. i-2 >= imin .and. j+2 <= jmax .and. j-2 >= jmin)then
fxy(i,j,k) = Fdxdy*( (fh(i-2,j-2,k)-F8*fh(i-1,j-2,k)+F8*fh(i+1,j-2,k)-fh(i+2,j-2,k)) &
-F8 *(fh(i-2,j-1,k)-F8*fh(i-1,j-1,k)+F8*fh(i+1,j-1,k)-fh(i+2,j-1,k)) &
+F8 *(fh(i-2,j+1,k)-F8*fh(i-1,j+1,k)+F8*fh(i+1,j+1,k)-fh(i+2,j+1,k)) &
- (fh(i-2,j+2,k)-F8*fh(i-1,j+2,k)+F8*fh(i+1,j+2,k)-fh(i+2,j+2,k)))
elseif(i+1 <= imax .and. i-1 >= imin .and. j+1 <= jmax .and. j-1 >= jmin)then
fxy(i,j,k) = Sdxdy*(fh(i-1,j-1,k)-fh(i+1,j-1,k)-fh(i-1,j+1,k)+fh(i+1,j+1,k))
endif
if(i+2 <= imax .and. i-2 >= imin .and. k+2 <= kmax .and. k-2 >= kmin)then
fxz(i,j,k) = Fdxdz*( (fh(i-2,j,k-2)-F8*fh(i-1,j,k-2)+F8*fh(i+1,j,k-2)-fh(i+2,j,k-2)) &
-F8 *(fh(i-2,j,k-1)-F8*fh(i-1,j,k-1)+F8*fh(i+1,j,k-1)-fh(i+2,j,k-1)) &
+F8 *(fh(i-2,j,k+1)-F8*fh(i-1,j,k+1)+F8*fh(i+1,j,k+1)-fh(i+2,j,k+1)) &
- (fh(i-2,j,k+2)-F8*fh(i-1,j,k+2)+F8*fh(i+1,j,k+2)-fh(i+2,j,k+2)))
elseif(i+1 <= imax .and. i-1 >= imin .and. k+1 <= kmax .and. k-1 >= kmin)then
fxz(i,j,k) = Sdxdz*(fh(i-1,j,k-1)-fh(i+1,j,k-1)-fh(i-1,j,k+1)+fh(i+1,j,k+1))
endif
if(j+2 <= jmax .and. j-2 >= jmin .and. k+2 <= kmax .and. k-2 >= kmin)then
fyz(i,j,k) = Fdydz*( (fh(i,j-2,k-2)-F8*fh(i,j-1,k-2)+F8*fh(i,j+1,k-2)-fh(i,j+2,k-2)) &
-F8 *(fh(i,j-2,k-1)-F8*fh(i,j-1,k-1)+F8*fh(i,j+1,k-1)-fh(i,j+2,k-1)) &
+F8 *(fh(i,j-2,k+1)-F8*fh(i,j-1,k+1)+F8*fh(i,j+1,k+1)-fh(i,j+2,k+1)) &
- (fh(i,j-2,k+2)-F8*fh(i,j-1,k+2)+F8*fh(i,j+1,k+2)-fh(i,j+2,k+2)))
elseif(j+1 <= jmax .and. j-1 >= jmin .and. k+1 <= kmax .and. k-1 >= kmin)then
fyz(i,j,k) = Sdydz*(fh(i,j-1,k-1)-fh(i,j+1,k-1)-fh(i,j-1,k+1)+fh(i,j+1,k+1))
endif
#else
! for bam comparison
if(i+2 <= imax .and. i-2 >= imin .and. &
j+2 <= jmax .and. j-2 >= jmin .and. &
k+2 <= kmax .and. k-2 >= kmin) then
fxx(i,j,k) = Fdxdx*(-fh(i-2,j,k)+F16*fh(i-1,j,k)-F30*fh(i,j,k) &
-fh(i+2,j,k)+F16*fh(i+1,j,k) )
fyy(i,j,k) = Fdydy*(-fh(i,j-2,k)+F16*fh(i,j-1,k)-F30*fh(i,j,k) &
-fh(i,j+2,k)+F16*fh(i,j+1,k) )
fzz(i,j,k) = Fdzdz*(-fh(i,j,k-2)+F16*fh(i,j,k-1)-F30*fh(i,j,k) &
-fh(i,j,k+2)+F16*fh(i,j,k+1) )
fxy(i,j,k) = Fdxdy*( (fh(i-2,j-2,k)-F8*fh(i-1,j-2,k)+F8*fh(i+1,j-2,k)-fh(i+2,j-2,k)) &
-F8 *(fh(i-2,j-1,k)-F8*fh(i-1,j-1,k)+F8*fh(i+1,j-1,k)-fh(i+2,j-1,k)) &
+F8 *(fh(i-2,j+1,k)-F8*fh(i-1,j+1,k)+F8*fh(i+1,j+1,k)-fh(i+2,j+1,k)) &
- (fh(i-2,j+2,k)-F8*fh(i-1,j+2,k)+F8*fh(i+1,j+2,k)-fh(i+2,j+2,k)))
fxz(i,j,k) = Fdxdz*( (fh(i-2,j,k-2)-F8*fh(i-1,j,k-2)+F8*fh(i+1,j,k-2)-fh(i+2,j,k-2)) &
-F8 *(fh(i-2,j,k-1)-F8*fh(i-1,j,k-1)+F8*fh(i+1,j,k-1)-fh(i+2,j,k-1)) &
+F8 *(fh(i-2,j,k+1)-F8*fh(i-1,j,k+1)+F8*fh(i+1,j,k+1)-fh(i+2,j,k+1)) &
- (fh(i-2,j,k+2)-F8*fh(i-1,j,k+2)+F8*fh(i+1,j,k+2)-fh(i+2,j,k+2)))
fyz(i,j,k) = Fdydz*( (fh(i,j-2,k-2)-F8*fh(i,j-1,k-2)+F8*fh(i,j+1,k-2)-fh(i,j+2,k-2)) &
-F8 *(fh(i,j-2,k-1)-F8*fh(i,j-1,k-1)+F8*fh(i,j+1,k-1)-fh(i,j+2,k-1)) &
+F8 *(fh(i,j-2,k+1)-F8*fh(i,j-1,k+1)+F8*fh(i,j+1,k+1)-fh(i,j+2,k+1)) &
- (fh(i,j-2,k+2)-F8*fh(i,j-1,k+2)+F8*fh(i,j+1,k+2)-fh(i,j+2,k+2)))
elseif(i+1 <= imax .and. i-1 >= imin .and. &
j+1 <= jmax .and. j-1 >= jmin .and. &
k+1 <= kmax .and. k-1 >= kmin) then
fxx(i,j,k) = Sdxdx*(fh(i-1,j,k)-TWO*fh(i,j,k)+fh(i+1,j,k))
fyy(i,j,k) = Sdydy*(fh(i,j-1,k)-TWO*fh(i,j,k)+fh(i,j+1,k))
fzz(i,j,k) = Sdzdz*(fh(i,j,k-1)-TWO*fh(i,j,k)+fh(i,j,k+1))
fxy(i,j,k) = Sdxdy*(fh(i-1,j-1,k)-fh(i+1,j-1,k)-fh(i-1,j+1,k)+fh(i+1,j+1,k))
fxz(i,j,k) = Sdxdz*(fh(i-1,j,k-1)-fh(i+1,j,k-1)-fh(i-1,j,k+1)+fh(i+1,j,k+1))
fyz(i,j,k) = Sdydz*(fh(i,j-1,k-1)-fh(i,j+1,k-1)-fh(i,j-1,k+1)+fh(i,j+1,k+1))
endif
#endif
enddo
enddo
enddo
return
end subroutine fdderivs_fh
#elif (ghost_width == 4) #elif (ghost_width == 4)
! sixth order code ! sixth order code

View File

@@ -19,48 +19,60 @@
!~~~~~~~> Local variable: !~~~~~~~> Local variable:
real*8, dimension(ex(1),ex(2),ex(3)) :: trA,detg integer :: i,j,k
real*8, dimension(ex(1),ex(2),ex(3)) :: gxx,gyy,gzz real*8 :: lgxx,lgyy,lgzz,ldetg
real*8, dimension(ex(1),ex(2),ex(3)) :: gupxx,gupxy,gupxz,gupyy,gupyz,gupzz real*8 :: lgupxx,lgupxy,lgupxz,lgupyy,lgupyz,lgupzz
real*8 :: ltrA,lscale
real*8, parameter :: F1o3 = 1.D0 / 3.D0, ONE = 1.D0, TWO = 2.D0 real*8, parameter :: F1o3 = 1.D0 / 3.D0, ONE = 1.D0, TWO = 2.D0
!~~~~~~> !~~~~~~>
gxx = dxx + ONE do k=1,ex(3)
gyy = dyy + ONE do j=1,ex(2)
gzz = dzz + ONE do i=1,ex(1)
detg = gxx * gyy * gzz + gxy * gyz * gxz + gxz * gxy * gyz - & lgxx = dxx(i,j,k) + ONE
gxz * gyy * gxz - gxy * gxy * gzz - gxx * gyz * gyz lgyy = dyy(i,j,k) + ONE
gupxx = ( gyy * gzz - gyz * gyz ) / detg lgzz = dzz(i,j,k) + ONE
gupxy = - ( gxy * gzz - gyz * gxz ) / detg
gupxz = ( gxy * gyz - gyy * gxz ) / detg
gupyy = ( gxx * gzz - gxz * gxz ) / detg
gupyz = - ( gxx * gyz - gxy * gxz ) / detg
gupzz = ( gxx * gyy - gxy * gxy ) / detg
trA = gupxx * Axx + gupyy * Ayy + gupzz * Azz & ldetg = lgxx * lgyy * lgzz &
+ TWO * (gupxy * Axy + gupxz * Axz + gupyz * Ayz) + gxy(i,j,k) * gyz(i,j,k) * gxz(i,j,k) &
+ gxz(i,j,k) * gxy(i,j,k) * gyz(i,j,k) &
- gxz(i,j,k) * lgyy * gxz(i,j,k) &
- gxy(i,j,k) * gxy(i,j,k) * lgzz &
- lgxx * gyz(i,j,k) * gyz(i,j,k)
Axx = Axx - F1o3 * gxx * trA lgupxx = ( lgyy * lgzz - gyz(i,j,k) * gyz(i,j,k) ) / ldetg
Axy = Axy - F1o3 * gxy * trA lgupxy = - ( gxy(i,j,k) * lgzz - gyz(i,j,k) * gxz(i,j,k) ) / ldetg
Axz = Axz - F1o3 * gxz * trA lgupxz = ( gxy(i,j,k) * gyz(i,j,k) - lgyy * gxz(i,j,k) ) / ldetg
Ayy = Ayy - F1o3 * gyy * trA lgupyy = ( lgxx * lgzz - gxz(i,j,k) * gxz(i,j,k) ) / ldetg
Ayz = Ayz - F1o3 * gyz * trA lgupyz = - ( lgxx * gyz(i,j,k) - gxy(i,j,k) * gxz(i,j,k) ) / ldetg
Azz = Azz - F1o3 * gzz * trA lgupzz = ( lgxx * lgyy - gxy(i,j,k) * gxy(i,j,k) ) / ldetg
detg = ONE / ( detg ** F1o3 ) ltrA = lgupxx * Axx(i,j,k) + lgupyy * Ayy(i,j,k) &
+ lgupzz * Azz(i,j,k) &
+ TWO * (lgupxy * Axy(i,j,k) + lgupxz * Axz(i,j,k) &
+ lgupyz * Ayz(i,j,k))
gxx = gxx * detg Axx(i,j,k) = Axx(i,j,k) - F1o3 * lgxx * ltrA
gxy = gxy * detg Axy(i,j,k) = Axy(i,j,k) - F1o3 * gxy(i,j,k) * ltrA
gxz = gxz * detg Axz(i,j,k) = Axz(i,j,k) - F1o3 * gxz(i,j,k) * ltrA
gyy = gyy * detg Ayy(i,j,k) = Ayy(i,j,k) - F1o3 * lgyy * ltrA
gyz = gyz * detg Ayz(i,j,k) = Ayz(i,j,k) - F1o3 * gyz(i,j,k) * ltrA
gzz = gzz * detg Azz(i,j,k) = Azz(i,j,k) - F1o3 * lgzz * ltrA
dxx = gxx - ONE lscale = ONE / ( ldetg ** F1o3 )
dyy = gyy - ONE
dzz = gzz - ONE dxx(i,j,k) = lgxx * lscale - ONE
gxy(i,j,k) = gxy(i,j,k) * lscale
gxz(i,j,k) = gxz(i,j,k) * lscale
dyy(i,j,k) = lgyy * lscale - ONE
gyz(i,j,k) = gyz(i,j,k) * lscale
dzz(i,j,k) = lgzz * lscale - ONE
enddo
enddo
enddo
return return
@@ -83,50 +95,70 @@
!~~~~~~~> Local variable: !~~~~~~~> Local variable:
real*8, dimension(ex(1),ex(2),ex(3)) :: trA integer :: i,j,k
real*8, dimension(ex(1),ex(2),ex(3)) :: gxx,gyy,gzz real*8 :: lgxx,lgyy,lgzz,lscale
real*8, dimension(ex(1),ex(2),ex(3)) :: gupxx,gupxy,gupxz,gupyy,gupyz,gupzz real*8 :: lgxy,lgxz,lgyz
real*8 :: lgupxx,lgupxy,lgupxz,lgupyy,lgupyz,lgupzz
real*8 :: ltrA
real*8, parameter :: F1o3 = 1.D0 / 3.D0, ONE = 1.D0, TWO = 2.D0 real*8, parameter :: F1o3 = 1.D0 / 3.D0, ONE = 1.D0, TWO = 2.D0
!~~~~~~> !~~~~~~>
gxx = dxx + ONE do k=1,ex(3)
gyy = dyy + ONE do j=1,ex(2)
gzz = dzz + ONE do i=1,ex(1)
! for g
gupzz = gxx * gyy * gzz + gxy * gyz * gxz + gxz * gxy * gyz - &
gxz * gyy * gxz - gxy * gxy * gzz - gxx * gyz * gyz
gupzz = ONE / ( gupzz ** F1o3 ) ! for g: normalize determinant first
lgxx = dxx(i,j,k) + ONE
lgyy = dyy(i,j,k) + ONE
lgzz = dzz(i,j,k) + ONE
lgxy = gxy(i,j,k)
lgxz = gxz(i,j,k)
lgyz = gyz(i,j,k)
gxx = gxx * gupzz lscale = lgxx * lgyy * lgzz + lgxy * lgyz * lgxz &
gxy = gxy * gupzz + lgxz * lgxy * lgyz - lgxz * lgyy * lgxz &
gxz = gxz * gupzz - lgxy * lgxy * lgzz - lgxx * lgyz * lgyz
gyy = gyy * gupzz
gyz = gyz * gupzz
gzz = gzz * gupzz
dxx = gxx - ONE lscale = ONE / ( lscale ** F1o3 )
dyy = gyy - ONE
dzz = gzz - ONE
! for A
gupxx = ( gyy * gzz - gyz * gyz ) lgxx = lgxx * lscale
gupxy = - ( gxy * gzz - gyz * gxz ) lgxy = lgxy * lscale
gupxz = ( gxy * gyz - gyy * gxz ) lgxz = lgxz * lscale
gupyy = ( gxx * gzz - gxz * gxz ) lgyy = lgyy * lscale
gupyz = - ( gxx * gyz - gxy * gxz ) lgyz = lgyz * lscale
gupzz = ( gxx * gyy - gxy * gxy ) lgzz = lgzz * lscale
trA = gupxx * Axx + gupyy * Ayy + gupzz * Azz & dxx(i,j,k) = lgxx - ONE
+ TWO * (gupxy * Axy + gupxz * Axz + gupyz * Ayz) gxy(i,j,k) = lgxy
gxz(i,j,k) = lgxz
dyy(i,j,k) = lgyy - ONE
gyz(i,j,k) = lgyz
dzz(i,j,k) = lgzz - ONE
Axx = Axx - F1o3 * gxx * trA ! for A: trace-free using normalized metric (det=1, no division needed)
Axy = Axy - F1o3 * gxy * trA lgupxx = ( lgyy * lgzz - lgyz * lgyz )
Axz = Axz - F1o3 * gxz * trA lgupxy = - ( lgxy * lgzz - lgyz * lgxz )
Ayy = Ayy - F1o3 * gyy * trA lgupxz = ( lgxy * lgyz - lgyy * lgxz )
Ayz = Ayz - F1o3 * gyz * trA lgupyy = ( lgxx * lgzz - lgxz * lgxz )
Azz = Azz - F1o3 * gzz * trA lgupyz = - ( lgxx * lgyz - lgxy * lgxz )
lgupzz = ( lgxx * lgyy - lgxy * lgxy )
ltrA = lgupxx * Axx(i,j,k) + lgupyy * Ayy(i,j,k) &
+ lgupzz * Azz(i,j,k) &
+ TWO * (lgupxy * Axy(i,j,k) + lgupxz * Axz(i,j,k) &
+ lgupyz * Ayz(i,j,k))
Axx(i,j,k) = Axx(i,j,k) - F1o3 * lgxx * ltrA
Axy(i,j,k) = Axy(i,j,k) - F1o3 * lgxy * ltrA
Axz(i,j,k) = Axz(i,j,k) - F1o3 * lgxz * ltrA
Ayy(i,j,k) = Ayy(i,j,k) - F1o3 * lgyy * ltrA
Ayz(i,j,k) = Ayz(i,j,k) - F1o3 * lgyz * ltrA
Azz(i,j,k) = Azz(i,j,k) - F1o3 * lgzz * ltrA
enddo
enddo
enddo
return return

View File

@@ -324,7 +324,6 @@ subroutine symmetry_bd(ord,extc,func,funcc,SoA)
integer::i integer::i
funcc = 0.d0
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1)
@@ -350,7 +349,6 @@ subroutine symmetry_tbd(ord,extc,func,funcc,SoA)
integer::i integer::i
funcc = 0.d0
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1)
@@ -379,7 +377,6 @@ subroutine symmetry_stbd(ord,extc,func,funcc,SoA)
integer::i integer::i
funcc = 0.d0
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+2,1:extc(2),1:extc(3))*SoA(1)
@@ -886,7 +883,6 @@ subroutine symmetry_bd(ord,extc,func,funcc,SoA)
integer::i integer::i
funcc = 0.d0
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1)
@@ -912,7 +908,6 @@ subroutine symmetry_tbd(ord,extc,func,funcc,SoA)
integer::i integer::i
funcc = 0.d0
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1)
@@ -941,7 +936,6 @@ subroutine symmetry_stbd(ord,extc,func,funcc,SoA)
integer::i integer::i
funcc = 0.d0
funcc(1:extc(1),1:extc(2),1:extc(3)) = func funcc(1:extc(1),1:extc(2),1:extc(3)) = func
do i=0,ord-1 do i=0,ord-1
funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1) funcc(-i,1:extc(2),1:extc(3)) = funcc(i+1,1:extc(2),1:extc(3))*SoA(1)
@@ -1118,64 +1112,65 @@ end subroutine d2dump
! Lagrangian polynomial interpolation ! Lagrangian polynomial interpolation
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
subroutine polint(xa,ya,x,y,dy,ordn) subroutine polint(xa, ya, x, y, dy, ordn)
implicit none implicit none
!~~~~~~> Input Parameter: integer, intent(in) :: ordn
integer,intent(in) :: ordn real*8, dimension(ordn), intent(in) :: xa, ya
real*8, dimension(ordn), intent(in) :: xa,ya
real*8, intent(in) :: x real*8, intent(in) :: x
real*8, intent(out) :: y,dy real*8, intent(out) :: y, dy
!~~~~~~> Other parameter: integer :: i, m, ns, n_m
real*8, dimension(ordn) :: c, d, ho
real*8 :: dif, dift, hp, h, den_val
integer :: m,n,ns c = ya
real*8, dimension(ordn) :: c,d,den,ho d = ya
real*8 :: dif,dift ho = xa - x
!~~~~~~> ns = 1
dif = abs(x - xa(1))
n=ordn do i = 2, ordn
m=ordn dift = abs(x - xa(i))
if (dift < dif) then
c=ya ns = i
d=ya dif = dift
ho=xa-x
ns=1
dif=abs(x-xa(1))
do m=1,n
dift=abs(x-xa(m))
if(dift < dif) then
ns=m
dif=dift
end if end if
end do end do
y=ya(ns) y = ya(ns)
ns=ns-1 ns = ns - 1
do m=1,n-1
den(1:n-m)=ho(1:n-m)-ho(1+m:n) do m = 1, ordn - 1
if (any(den(1:n-m) == 0.0))then n_m = ordn - m
do i = 1, n_m
hp = ho(i)
h = ho(i+m)
den_val = hp - h
if (den_val == 0.0d0) then
write(*,*) 'failure in polint for point',x write(*,*) 'failure in polint for point',x
write(*,*) 'with input points: ',xa write(*,*) 'with input points: ',xa
stop stop
endif
den(1:n-m)=(c(2:n-m+1)-d(1:n-m))/den(1:n-m)
d(1:n-m)=ho(1+m:n)*den(1:n-m)
c(1:n-m)=ho(1:n-m)*den(1:n-m)
if (2*ns < n-m) then
dy=c(ns+1)
else
dy=d(ns)
ns=ns-1
end if end if
y=y+dy
den_val = (c(i+1) - d(i)) / den_val
d(i) = h * den_val
c(i) = hp * den_val
end do
if (2 * ns < n_m) then
dy = c(ns + 1)
else
dy = d(ns)
ns = ns - 1
end if
y = y + dy
end do end do
return return
end subroutine polint end subroutine polint
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
! !
@@ -1183,35 +1178,37 @@ end subroutine d2dump
! !
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
subroutine polin2(x1a,x2a,ya,x1,x2,y,dy,ordn) subroutine polin2(x1a,x2a,ya,x1,x2,y,dy,ordn)
implicit none implicit none
!~~~~~~> Input parameters:
integer,intent(in) :: ordn integer,intent(in) :: ordn
real*8, dimension(1:ordn), intent(in) :: x1a,x2a real*8, dimension(1:ordn), intent(in) :: x1a,x2a
real*8, dimension(1:ordn,1:ordn), intent(in) :: ya real*8, dimension(1:ordn,1:ordn), intent(in) :: ya
real*8, intent(in) :: x1,x2 real*8, intent(in) :: x1,x2
real*8, intent(out) :: y,dy real*8, intent(out) :: y,dy
!~~~~~~> Other parameters: #ifdef POLINT_LEGACY_ORDER
integer :: i,m integer :: i,m
real*8, dimension(ordn) :: ymtmp real*8, dimension(ordn) :: ymtmp
real*8, dimension(ordn) :: yntmp real*8, dimension(ordn) :: yntmp
m=size(x1a) m=size(x1a)
do i=1,m do i=1,m
yntmp=ya(i,:) yntmp=ya(i,:)
call polint(x2a,yntmp,x2,ymtmp(i),dy,ordn) call polint(x2a,yntmp,x2,ymtmp(i),dy,ordn)
end do end do
call polint(x1a,ymtmp,x1,y,dy,ordn) call polint(x1a,ymtmp,x1,y,dy,ordn)
#else
integer :: j
real*8, dimension(ordn) :: ymtmp
real*8 :: dy_temp
do j=1,ordn
call polint(x1a, ya(:,j), x1, ymtmp(j), dy_temp, ordn)
end do
call polint(x2a, ymtmp, x2, y, dy, ordn)
#endif
return return
end subroutine polin2 end subroutine polin2
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
! !
@@ -1219,18 +1216,15 @@ end subroutine d2dump
! !
!------------------------------------------------------------------------------ !------------------------------------------------------------------------------
subroutine polin3(x1a,x2a,x3a,ya,x1,x2,x3,y,dy,ordn) subroutine polin3(x1a,x2a,x3a,ya,x1,x2,x3,y,dy,ordn)
implicit none implicit none
!~~~~~~> Input parameters:
integer,intent(in) :: ordn integer,intent(in) :: ordn
real*8, dimension(1:ordn), intent(in) :: x1a,x2a,x3a real*8, dimension(1:ordn), intent(in) :: x1a,x2a,x3a
real*8, dimension(1:ordn,1:ordn,1:ordn), intent(in) :: ya real*8, dimension(1:ordn,1:ordn,1:ordn), intent(in) :: ya
real*8, intent(in) :: x1,x2,x3 real*8, intent(in) :: x1,x2,x3
real*8, intent(out) :: y,dy real*8, intent(out) :: y,dy
!~~~~~~> Other parameters: #ifdef POLINT_LEGACY_ORDER
integer :: i,j,m,n integer :: i,j,m,n
real*8, dimension(ordn,ordn) :: yatmp real*8, dimension(ordn,ordn) :: yatmp
real*8, dimension(ordn) :: ymtmp real*8, dimension(ordn) :: ymtmp
@@ -1239,24 +1233,33 @@ end subroutine d2dump
m=size(x1a) m=size(x1a)
n=size(x2a) n=size(x2a)
do i=1,m do i=1,m
do j=1,n do j=1,n
yqtmp=ya(i,j,:) yqtmp=ya(i,j,:)
call polint(x3a,yqtmp,x3,yatmp(i,j),dy,ordn) call polint(x3a,yqtmp,x3,yatmp(i,j),dy,ordn)
end do end do
yntmp=yatmp(i,:) yntmp=yatmp(i,:)
call polint(x2a,yntmp,x2,ymtmp(i),dy,ordn) call polint(x2a,yntmp,x2,ymtmp(i),dy,ordn)
end do end do
call polint(x1a,ymtmp,x1,y,dy,ordn) call polint(x1a,ymtmp,x1,y,dy,ordn)
#else
integer :: j, k
real*8, dimension(ordn,ordn) :: yatmp
real*8, dimension(ordn) :: ymtmp
real*8 :: dy_temp
do k=1,ordn
do j=1,ordn
call polint(x1a, ya(:,j,k), x1, yatmp(j,k), dy_temp, ordn)
end do
end do
do k=1,ordn
call polint(x2a, yatmp(:,k), x2, ymtmp(k), dy_temp, ordn)
end do
call polint(x3a, ymtmp, x3, y, dy, ordn)
#endif
return return
end subroutine polin3 end subroutine polin3
!-------------------------------------------------------------------------------------- !--------------------------------------------------------------------------------------
! calculate L2norm ! calculate L2norm
@@ -1276,7 +1279,9 @@ end subroutine d2dump
real*8 :: dX, dY, dZ real*8 :: dX, dY, dZ
integer::imin,jmin,kmin integer::imin,jmin,kmin
integer::imax,jmax,kmax integer::imax,jmax,kmax
integer::i,j,k integer::i,j,k,n_elements
real*8, dimension(:), allocatable :: f_flat
real*8, external :: DDOT
dX = X(2) - X(1) dX = X(2) - X(1)
dY = Y(2) - Y(1) dY = Y(2) - Y(1)
@@ -1300,7 +1305,12 @@ if(dabs(X(1)-xmin) < dX) imin = 1
if(dabs(Y(1)-ymin) < dY) jmin = 1 if(dabs(Y(1)-ymin) < dY) jmin = 1
if(dabs(Z(1)-zmin) < dZ) kmin = 1 if(dabs(Z(1)-zmin) < dZ) kmin = 1
f_out = sum(f(imin:imax,jmin:jmax,kmin:kmax)*f(imin:imax,jmin:jmax,kmin:kmax)) ! Optimized with oneMKL BLAS DDOT for dot product
n_elements = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
allocate(f_flat(n_elements))
f_flat = reshape(f(imin:imax,jmin:jmax,kmin:kmax), [n_elements])
f_out = DDOT(n_elements, f_flat, 1, f_flat, 1)
deallocate(f_flat)
f_out = f_out*dX*dY*dZ f_out = f_out*dX*dY*dZ
@@ -1325,7 +1335,9 @@ f_out = f_out*dX*dY*dZ
real*8 :: dX, dY, dZ real*8 :: dX, dY, dZ
integer::imin,jmin,kmin integer::imin,jmin,kmin
integer::imax,jmax,kmax integer::imax,jmax,kmax
integer::i,j,k integer::i,j,k,n_elements
real*8, dimension(:), allocatable :: f_flat
real*8, external :: DDOT
real*8 :: PIo4 real*8 :: PIo4
@@ -1388,7 +1400,12 @@ if(Symmetry==2)then
if(dabs(ymin+gw*dY)<dY.and.Y(1)<0.d0) jmin = gw+1 if(dabs(ymin+gw*dY)<dY.and.Y(1)<0.d0) jmin = gw+1
endif endif
f_out = sum(f(imin:imax,jmin:jmax,kmin:kmax)*f(imin:imax,jmin:jmax,kmin:kmax)) ! Optimized with oneMKL BLAS DDOT for dot product
n_elements = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
allocate(f_flat(n_elements))
f_flat = reshape(f(imin:imax,jmin:jmax,kmin:kmax), [n_elements])
f_out = DDOT(n_elements, f_flat, 1, f_flat, 1)
deallocate(f_flat)
f_out = f_out*dX*dY*dZ f_out = f_out*dX*dY*dZ
@@ -1416,6 +1433,8 @@ f_out = f_out*dX*dY*dZ
integer::imin,jmin,kmin integer::imin,jmin,kmin
integer::imax,jmax,kmax integer::imax,jmax,kmax
integer::i,j,k integer::i,j,k
real*8, dimension(:), allocatable :: f_flat
real*8, external :: DDOT
real*8 :: PIo4 real*8 :: PIo4
@@ -1478,11 +1497,12 @@ if(Symmetry==2)then
if(dabs(ymin+gw*dY)<dY.and.Y(1)<0.d0) jmin = gw+1 if(dabs(ymin+gw*dY)<dY.and.Y(1)<0.d0) jmin = gw+1
endif endif
f_out = sum(f(imin:imax,jmin:jmax,kmin:kmax)*f(imin:imax,jmin:jmax,kmin:kmax)) ! Optimized with oneMKL BLAS DDOT for dot product
f_out = f_out
Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1) Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
allocate(f_flat(Nout))
f_flat = reshape(f(imin:imax,jmin:jmax,kmin:kmax), [Nout])
f_out = DDOT(Nout, f_flat, 1, f_flat, 1)
deallocate(f_flat)
return return
@@ -1680,6 +1700,7 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
real*8, dimension(ORDN,ORDN) :: tmp2 real*8, dimension(ORDN,ORDN) :: tmp2
real*8, dimension(ORDN) :: tmp1 real*8, dimension(ORDN) :: tmp1
real*8, dimension(3) :: SoAh real*8, dimension(3) :: SoAh
real*8, external :: DDOT
! +1 because c++ gives 0 for first point ! +1 because c++ gives 0 for first point
cxB = inds+1 cxB = inds+1
@@ -1715,20 +1736,21 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
ya=fh(cxB(1):cxT(1),cxB(2):cxT(2),cxB(3):cxT(3)) ya=fh(cxB(1):cxT(1),cxB(2):cxT(2),cxB(3):cxT(3))
endif endif
! Optimized with BLAS operations for better performance
! First dimension: z-direction weighted sum
tmp2=0 tmp2=0
do m=1,ORDN do m=1,ORDN
tmp2 = tmp2 + coef(2*ORDN+m)*ya(:,:,m) tmp2 = tmp2 + coef(2*ORDN+m)*ya(:,:,m)
enddo enddo
! Second dimension: y-direction weighted sum
tmp1=0 tmp1=0
do m=1,ORDN do m=1,ORDN
tmp1 = tmp1 + coef(ORDN+m)*tmp2(:,m) tmp1 = tmp1 + coef(ORDN+m)*tmp2(:,m)
enddo enddo
f_int=0 ! Third dimension: x-direction weighted sum using BLAS DDOT
do m=1,ORDN f_int = DDOT(ORDN, coef(1:ORDN), 1, tmp1, 1)
f_int = f_int + coef(m)*tmp1(m)
enddo
return return
@@ -1758,6 +1780,7 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
real*8, dimension(ORDN,ORDN) :: ya real*8, dimension(ORDN,ORDN) :: ya
real*8, dimension(ORDN) :: tmp1 real*8, dimension(ORDN) :: tmp1
real*8, dimension(2) :: SoAh real*8, dimension(2) :: SoAh
real*8, external :: DDOT
! +1 because c++ gives 0 for first point ! +1 because c++ gives 0 for first point
cxB = inds(1:2)+1 cxB = inds(1:2)+1
@@ -1787,15 +1810,14 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
ya=fh(cxB(1):cxT(1),cxB(2):cxT(2),inds(3)) ya=fh(cxB(1):cxT(1),cxB(2):cxT(2),inds(3))
endif endif
! Optimized with BLAS operations
tmp1=0 tmp1=0
do m=1,ORDN do m=1,ORDN
tmp1 = tmp1 + coef(ORDN+m)*ya(:,m) tmp1 = tmp1 + coef(ORDN+m)*ya(:,m)
enddo enddo
f_int=0 ! Use BLAS DDOT for final weighted sum
do m=1,ORDN f_int = DDOT(ORDN, coef(1:ORDN), 1, tmp1, 1)
f_int = f_int + coef(m)*tmp1(m)
enddo
return return
@@ -1826,6 +1848,7 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
real*8, dimension(ORDN) :: ya real*8, dimension(ORDN) :: ya
real*8 :: SoAh real*8 :: SoAh
integer,dimension(3) :: inds integer,dimension(3) :: inds
real*8, external :: DDOT
! +1 because c++ gives 0 for first point ! +1 because c++ gives 0 for first point
inds = indsi + 1 inds = indsi + 1
@@ -1886,10 +1909,8 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
write(*,*)"error in global_interpind1d, not recognized dumyd = ",dumyd write(*,*)"error in global_interpind1d, not recognized dumyd = ",dumyd
endif endif
f_int=0 ! Optimized with BLAS DDOT for weighted sum
do m=1,ORDN f_int = DDOT(ORDN, coef, 1, ya, 1)
f_int = f_int + coef(m)*ya(m)
enddo
return return
@@ -2121,24 +2142,38 @@ Nout = (imax-imin+1)*(jmax-jmin+1)*(kmax-kmin+1)
end function fWigner_d_function end function fWigner_d_function
!---------------------------------- !----------------------------------
! Optimized factorial function using lookup table for small N
! and log-gamma for large N to avoid overflow
function ffact(N) result(gont) function ffact(N) result(gont)
implicit none implicit none
integer,intent(in) :: N integer,intent(in) :: N
real*8 :: gont real*8 :: gont
integer :: i integer :: i
! Lookup table for factorials 0! to 20! (precomputed)
real*8, parameter, dimension(0:20) :: fact_table = [ &
1.d0, 1.d0, 2.d0, 6.d0, 24.d0, 120.d0, 720.d0, 5040.d0, 40320.d0, &
362880.d0, 3628800.d0, 39916800.d0, 479001600.d0, 6227020800.d0, &
87178291200.d0, 1307674368000.d0, 20922789888000.d0, &
355687428096000.d0, 6402373705728000.d0, 121645100408832000.d0, &
2432902008176640000.d0 ]
! sanity check ! sanity check
if(N < 0)then if(N < 0)then
write(*,*) "ffact: error input for factorial" write(*,*) "ffact: error input for factorial"
gont = 1.d0
return return
endif endif
gont = 1.d0 ! Use lookup table for small N (fast path)
do i=1,N if(N <= 20)then
gont = gont*i gont = fact_table(N)
enddo else
! Use log-gamma function for large N: N! = exp(log_gamma(N+1))
! This avoids overflow and is computed efficiently
gont = exp(log_gamma(dble(N+1)))
endif
return return

View File

@@ -16,115 +16,66 @@ using namespace std;
#include <string.h> #include <string.h>
#include <math.h> #include <math.h>
#endif #endif
/* Linear equation solution by Gauss-Jordan elimination.
// Intel oneMKL LAPACK interface
#include <mkl_lapacke.h>
/* Linear equation solution using Intel oneMKL LAPACK.
a[0..n-1][0..n-1] is the input matrix. b[0..n-1] is input a[0..n-1][0..n-1] is the input matrix. b[0..n-1] is input
containing the right-hand side vectors. On output a is containing the right-hand side vectors. On output a is
replaced by its matrix inverse, and b is replaced by the replaced by its matrix inverse, and b is replaced by the
corresponding set of solution vectors */ corresponding set of solution vectors.
Mathematical equivalence:
Solves: A * x = b => x = A^(-1) * b
Original Gauss-Jordan and LAPACK dgesv/dgetri produce identical results
within numerical precision. */
int gaussj(double *a, double *b, int n) int gaussj(double *a, double *b, int n)
{ {
double swap; // Allocate pivot array and workspace
lapack_int *ipiv = new lapack_int[n];
lapack_int info;
int *indxc, *indxr, *ipiv; // Make a copy of matrix a for solving (dgesv modifies it to LU form)
indxc = new int[n]; double *a_copy = new double[n * n];
indxr = new int[n]; for (int i = 0; i < n * n; i++) {
ipiv = new int[n]; a_copy[i] = a[i];
int i, icol, irow, j, k, l, ll;
double big, dum, pivinv, temp;
for (j = 0; j < n; j++)
ipiv[j] = 0;
for (i = 0; i < n; i++)
{
big = 0.0;
for (j = 0; j < n; j++)
if (ipiv[j] != 1)
for (k = 0; k < n; k++)
{
if (ipiv[k] == 0)
{
if (fabs(a[j * n + k]) >= big)
{
big = fabs(a[j * n + k]);
irow = j;
icol = k;
}
}
else if (ipiv[k] > 1)
{
cout << "gaussj: Singular Matrix-1" << endl;
for (int ii = 0; ii < n; ii++)
{
for (int jj = 0; jj < n; jj++)
cout << a[ii * n + jj] << " ";
cout << endl;
}
return 1; // error return
}
} }
ipiv[icol] = ipiv[icol] + 1; // Step 1: Solve linear system A*x = b using LU decomposition
if (irow != icol) // LAPACKE_dgesv uses column-major by default, but we use row-major
{ info = LAPACKE_dgesv(LAPACK_ROW_MAJOR, n, 1, a_copy, n, ipiv, b, 1);
for (l = 0; l < n; l++)
{
swap = a[irow * n + l];
a[irow * n + l] = a[icol * n + l];
a[icol * n + l] = swap;
}
swap = b[irow]; if (info != 0) {
b[irow] = b[icol]; cout << "gaussj: Singular Matrix (dgesv info=" << info << ")" << endl;
b[icol] = swap;
}
indxr[i] = irow;
indxc[i] = icol;
if (a[icol * n + icol] == 0.0)
{
cout << "gaussj: Singular Matrix-2" << endl;
for (int ii = 0; ii < n; ii++)
{
for (int jj = 0; jj < n; jj++)
cout << a[ii * n + jj] << " ";
cout << endl;
}
return 1; // error return
}
pivinv = 1.0 / a[icol * n + icol];
a[icol * n + icol] = 1.0;
for (l = 0; l < n; l++)
a[icol * n + l] *= pivinv;
b[icol] *= pivinv;
for (ll = 0; ll < n; ll++)
if (ll != icol)
{
dum = a[ll * n + icol];
a[ll * n + icol] = 0.0;
for (l = 0; l < n; l++)
a[ll * n + l] -= a[icol * n + l] * dum;
b[ll] -= b[icol] * dum;
}
}
for (l = n - 1; l >= 0; l--)
{
if (indxr[l] != indxc[l])
for (k = 0; k < n; k++)
{
swap = a[k * n + indxr[l]];
a[k * n + indxr[l]] = a[k * n + indxc[l]];
a[k * n + indxc[l]] = swap;
}
}
delete[] indxc;
delete[] indxr;
delete[] ipiv; delete[] ipiv;
delete[] a_copy;
return 1;
}
// Step 2: Compute matrix inverse A^(-1) using LU factorization
// First do LU factorization of original matrix a
info = LAPACKE_dgetrf(LAPACK_ROW_MAJOR, n, n, a, n, ipiv);
if (info != 0) {
cout << "gaussj: Singular Matrix (dgetrf info=" << info << ")" << endl;
delete[] ipiv;
delete[] a_copy;
return 1;
}
// Then compute inverse from LU factorization
info = LAPACKE_dgetri(LAPACK_ROW_MAJOR, n, a, n, ipiv);
if (info != 0) {
cout << "gaussj: Singular Matrix (dgetri info=" << info << ")" << endl;
delete[] ipiv;
delete[] a_copy;
return 1;
}
delete[] ipiv;
delete[] a_copy;
return 0; return 0;
} }

View File

@@ -512,11 +512,10 @@
IMPLICIT DOUBLE PRECISION (A-H,O-Z) IMPLICIT DOUBLE PRECISION (A-H,O-Z)
DIMENSION V(N),W(N) DIMENSION V(N),W(N)
! SUBROUTINE TO COMPUTE DOUBLE PRECISION VECTOR DOT PRODUCT. ! SUBROUTINE TO COMPUTE DOUBLE PRECISION VECTOR DOT PRODUCT.
! Optimized using Intel oneMKL BLAS ddot
! Mathematical equivalence: DGVV = sum_{i=1}^{N} V(i)*W(i)
SUM = 0.0D0 DOUBLE PRECISION, EXTERNAL :: DDOT
DO 10 I = 1,N DGVV = DDOT(N, V, 1, W, 1)
SUM = SUM + V(I)*W(I)
10 CONTINUE
DGVV = SUM
RETURN RETURN
END END

View File

@@ -215,6 +215,99 @@ integer, parameter :: NO_SYMM=0, OCTANT=2
end subroutine kodis end subroutine kodis
!-----------------------------------------------------------------------------
! kodis variant: reuses caller-provided fh work array (memory pool)
!-----------------------------------------------------------------------------
subroutine kodis_fh(ex,X,Y,Z,f,f_rhs,SoA,Symmetry,eps,fh)
implicit none
! argument variables
integer,intent(in) :: Symmetry
integer,dimension(3),intent(in)::ex
real*8, dimension(1:3), intent(in) :: SoA
double precision,intent(in),dimension(ex(1))::X
double precision,intent(in),dimension(ex(2))::Y
double precision,intent(in),dimension(ex(3))::Z
double precision,intent(in),dimension(ex(1),ex(2),ex(3))::f
double precision,intent(inout),dimension(ex(1),ex(2),ex(3))::f_rhs
real*8,intent(in) :: eps
real*8,dimension(-2:ex(1),-2:ex(2),-2:ex(3)),intent(inout):: fh
! local variables
integer :: imin,jmin,kmin,imax,jmax,kmax
integer :: i,j,k
real*8 :: dX,dY,dZ
real*8, parameter :: ONE=1.d0,SIX=6.d0,FIT=1.5d1,TWT=2.d1
real*8,parameter::cof=6.4d1 ! 2^6
integer, parameter :: NO_SYMM=0, OCTANT=2
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2
if(Symmetry == OCTANT .and. dabs(X(1)) < dX) imin = -2
if(Symmetry == OCTANT .and. dabs(Y(1)) < dY) jmin = -2
call symmetry_bd(3,ex,f,fh,SoA)
do k=1,ex(3)
do j=1,ex(2)
do i=1,ex(1)
if(i-3 >= imin .and. i+3 <= imax .and. &
j-3 >= jmin .and. j+3 <= jmax .and. &
k-3 >= kmin .and. k+3 <= kmax) then
#if 0
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dX/cof * ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dY/cof * ( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/dZ/cof * ( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )
#else
f_rhs(i,j,k) = f_rhs(i,j,k) + eps/cof *( ( &
(fh(i-3,j,k)+fh(i+3,j,k)) - &
SIX*(fh(i-2,j,k)+fh(i+2,j,k)) + &
FIT*(fh(i-1,j,k)+fh(i+1,j,k)) - &
TWT* fh(i,j,k) )/dX + &
( &
(fh(i,j-3,k)+fh(i,j+3,k)) - &
SIX*(fh(i,j-2,k)+fh(i,j+2,k)) + &
FIT*(fh(i,j-1,k)+fh(i,j+1,k)) - &
TWT* fh(i,j,k) )/dY + &
( &
(fh(i,j,k-3)+fh(i,j,k+3)) - &
SIX*(fh(i,j,k-2)+fh(i,j,k+2)) + &
FIT*(fh(i,j,k-1)+fh(i,j,k+1)) - &
TWT* fh(i,j,k) )/dZ )
#endif
endif
enddo
enddo
enddo
return
end subroutine kodis_fh
#elif (ghost_width == 4) #elif (ghost_width == 4)
! sixth order code ! sixth order code
!------------------------------------------------------------------------------------------------------------------------------ !------------------------------------------------------------------------------------------------------------------------------

View File

@@ -487,6 +487,160 @@ subroutine lopsided(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA)
end subroutine lopsided end subroutine lopsided
!-----------------------------------------------------------------------------
! lopsided variant: reuses caller-provided fh work array (memory pool)
!-----------------------------------------------------------------------------
subroutine lopsided_fh(ex,X,Y,Z,f,f_rhs,Sfx,Sfy,Sfz,Symmetry,SoA,fh)
implicit none
!~~~~~~> Input parameters:
integer, intent(in) :: ex(1:3),Symmetry
real*8, intent(in) :: X(1:ex(1)),Y(1:ex(2)),Z(1:ex(3))
real*8,dimension(ex(1),ex(2),ex(3)),intent(in) :: f,Sfx,Sfy,Sfz
real*8,dimension(ex(1),ex(2),ex(3)),intent(inout):: f_rhs
real*8,dimension(3),intent(in) ::SoA
real*8,dimension(-2:ex(1),-2:ex(2),-2:ex(3)),intent(inout):: fh
!~~~~~~> local variables:
integer :: imin,jmin,kmin,imax,jmax,kmax,i,j,k
real*8 :: dX,dY,dZ
real*8 :: d12dx,d12dy,d12dz,d2dx,d2dy,d2dz
real*8, parameter :: ZEO=0.d0,ONE=1.d0, F3=3.d0
real*8, parameter :: TWO=2.d0,F6=6.0d0,F18=1.8d1
real*8, parameter :: F12=1.2d1, F10=1.d1,EIT=8.d0
integer, parameter :: NO_SYMM = 0, EQ_SYMM = 1, OCTANT = 2
dX = X(2)-X(1)
dY = Y(2)-Y(1)
dZ = Z(2)-Z(1)
d12dx = ONE/F12/dX
d12dy = ONE/F12/dY
d12dz = ONE/F12/dZ
d2dx = ONE/TWO/dX
d2dy = ONE/TWO/dY
d2dz = ONE/TWO/dZ
imax = ex(1)
jmax = ex(2)
kmax = ex(3)
imin = 1
jmin = 1
kmin = 1
if(Symmetry > NO_SYMM .and. dabs(Z(1)) < dZ) kmin = -2
if(Symmetry > EQ_SYMM .and. dabs(X(1)) < dX) imin = -2
if(Symmetry > EQ_SYMM .and. dabs(Y(1)) < dY) jmin = -2
call symmetry_bd(3,ex,f,fh,SoA)
! upper bound set ex-1 only for efficiency,
! the loop body will set ex 0 also
do k=1,ex(3)-1
do j=1,ex(2)-1
do i=1,ex(1)-1
#if 0
!! old code - same as original lopsided
#else
!! new code, 2012dec27, based on bam
! x direction
if(Sfx(i,j,k) > ZEO)then
if(i+3 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
elseif(i+2 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i+1 <= imax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
endif
elseif(Sfx(i,j,k) < ZEO)then
if(i-3 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfx(i,j,k)*d12dx*(-F3*fh(i+1,j,k)-F10*fh(i,j,k)+F18*fh(i-1,j,k) &
-F6*fh(i-2,j,k)+ fh(i-3,j,k))
elseif(i-2 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(fh(i-2,j,k)-EIT*fh(i-1,j,k)+EIT*fh(i+1,j,k)-fh(i+2,j,k))
elseif(i-1 >= imin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfx(i,j,k)*d12dx*(-F3*fh(i-1,j,k)-F10*fh(i,j,k)+F18*fh(i+1,j,k) &
-F6*fh(i+2,j,k)+ fh(i+3,j,k))
endif
endif
! y direction
if(Sfy(i,j,k) > ZEO)then
if(j+3 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
elseif(j+2 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j+1 <= jmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
endif
elseif(Sfy(i,j,k) < ZEO)then
if(j-3 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j+1,k)-F10*fh(i,j,k)+F18*fh(i,j-1,k) &
-F6*fh(i,j-2,k)+ fh(i,j-3,k))
elseif(j-2 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(fh(i,j-2,k)-EIT*fh(i,j-1,k)+EIT*fh(i,j+1,k)-fh(i,j+2,k))
elseif(j-1 >= jmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfy(i,j,k)*d12dy*(-F3*fh(i,j-1,k)-F10*fh(i,j,k)+F18*fh(i,j+1,k) &
-F6*fh(i,j+2,k)+ fh(i,j+3,k))
endif
endif
! z direction
if(Sfz(i,j,k) > ZEO)then
if(k+3 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
elseif(k+2 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k+1 <= kmax)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
endif
elseif(Sfz(i,j,k) < ZEO)then
if(k-3 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)- &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k+1)-F10*fh(i,j,k)+F18*fh(i,j,k-1) &
-F6*fh(i,j,k-2)+ fh(i,j,k-3))
elseif(k-2 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(fh(i,j,k-2)-EIT*fh(i,j,k-1)+EIT*fh(i,j,k+1)-fh(i,j,k+2))
elseif(k-1 >= kmin)then
f_rhs(i,j,k)=f_rhs(i,j,k)+ &
Sfz(i,j,k)*d12dz*(-F3*fh(i,j,k-1)-F10*fh(i,j,k)+F18*fh(i,j,k+1) &
-F6*fh(i,j,k+2)+ fh(i,j,k+3))
endif
endif
#endif
enddo
enddo
enddo
return
end subroutine lopsided_fh
#elif (ghost_width == 4) #elif (ghost_width == 4)
! sixth order code ! sixth order code
! Compute advection terms in right hand sides of field equations ! Compute advection terms in right hand sides of field equations

View File

@@ -2,7 +2,7 @@
#ifndef MICRODEF_H #ifndef MICRODEF_H
#define MICRODEF_H #define MICRODEF_H
#include "microdef.fh" #include "macrodef.fh"
// application parameters // application parameters

View File

@@ -1,19 +1,30 @@
## GCC version (commented out)
## filein = -I/usr/include -I/usr/lib/x86_64-linux-gnu/mpich/include -I/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/gcc/x86_64-linux-gnu/11/ -I/usr/include/c++/11/ ## filein = -I/usr/include -I/usr/lib/x86_64-linux-gnu/mpich/include -I/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/gcc/x86_64-linux-gnu/11/ -I/usr/include/c++/11/
## filein = -I/usr/include/ -I/usr/include/openmpi-x86_64/ -I/usr/lib/x86_64-linux-gnu/openmpi/include/ -I/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/gcc/x86_64-linux-gnu/11/ -I/usr/include/c++/11/
## LDLIBS = -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lgfortran -lmpi -lgfortran
filein = -I/usr/include/ -I/usr/lib/x86_64-linux-gnu/openmpi/include/ -I/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/gcc/x86_64-linux-gnu/11/ -I/usr/include/c++/11/ -I/usr/lib/cuda/include ## Intel oneAPI version with oneMKL (Optimized for performance)
filein = -I/usr/include/ -I${MKLROOT}/include
## LDLIBS = -L/usr/lib/x86_64-linux-gnu -lmpich -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lgfortran ## Using sequential MKL (OpenMP disabled for better single-threaded performance)
LDLIBS = -L/usr/lib/x86_64-linux-gnu -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/11 -lgfortran -L/usr/lib/cuda/lib64 -lcudart -lmpi -lgfortran ## Added -lifcore for Intel Fortran runtime and -limf for Intel math library
LDLIBS = -L${MKLROOT}/lib -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lifcore -limf -lpthread -lm -ldl
CXXAPPFLAGS = -O3 -Wno-deprecated -Dfortran3 -Dnewc ## Aggressive optimization flags:
#f90appflags = -O3 -fpp ## -O3: Maximum optimization
f90appflags = -O3 -x f95-cpp-input ## -xHost: Optimize for the host CPU architecture (Intel/AMD compatible)
f90 = gfortran ## -fp-model fast=2: Aggressive floating-point optimizations
f77 = gfortran ## -fma: Enable fused multiply-add instructions
CXX = g++ ## Note: OpenMP has been disabled (-qopenmp removed) due to performance issues
CC = gcc CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
CLINKER = mpic++ -Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
-align array64byte -fpp -I${MKLROOT}/include
f90 = ifx
f77 = ifx
CXX = icpx
CC = icx
CLINKER = mpiicpx
Cu = nvcc Cu = nvcc
CUDA_LIB_PATH = -L/usr/lib/cuda/lib64 -I/usr/include -I/usr/lib/cuda/include CUDA_LIB_PATH = -L/usr/lib/cuda/lib64 -I/usr/include -I/usr/lib/cuda/include

View File

@@ -11,6 +11,18 @@
import AMSS_NCKU_Input as input_data import AMSS_NCKU_Input as input_data
import subprocess import subprocess
## CPU core binding configuration using taskset
## taskset ensures all child processes inherit the CPU affinity mask
## This forces make and all compiler processes to use only nohz_full cores (4-55, 60-111)
## Format: taskset -c 4-55,60-111 ensures processes only run on these cores
NUMACTL_CPU_BIND = "taskset -c 16-47,64-95"
#NUMACTL_CPU_BIND = "taskset -c 0-111"
## Build parallelism configuration
## Use nohz_full cores (4-55, 60-111) for compilation: 52 + 52 = 104 cores
## Set make -j to utilize available cores for faster builds
BUILD_JOBS = 64
################################################################## ##################################################################
@@ -26,11 +38,11 @@ def makefile_ABE():
print( " Compiling the AMSS-NCKU executable file ABE/ABEGPU " ) print( " Compiling the AMSS-NCKU executable file ABE/ABEGPU " )
print( ) print( )
## Build command ## Build command with CPU binding to nohz_full cores
if (input_data.GPU_Calculation == "no"): if (input_data.GPU_Calculation == "no"):
makefile_command = "make -j4" + " ABE" makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} ABE"
elif (input_data.GPU_Calculation == "yes"): elif (input_data.GPU_Calculation == "yes"):
makefile_command = "make -j4" + " ABEGPU" makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} ABEGPU"
else: else:
print( " CPU/GPU numerical calculation setting is wrong " ) print( " CPU/GPU numerical calculation setting is wrong " )
print( ) print( )
@@ -67,8 +79,8 @@ def makefile_TwoPunctureABE():
print( " Compiling the AMSS-NCKU executable file TwoPunctureABE " ) print( " Compiling the AMSS-NCKU executable file TwoPunctureABE " )
print( ) print( )
## Build command ## Build command with CPU binding to nohz_full cores
makefile_command = "make" + " TwoPunctureABE" makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} TwoPunctureABE"
## Execute the command with subprocess.Popen and stream output ## Execute the command with subprocess.Popen and stream output
makefile_process = subprocess.Popen(makefile_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True) makefile_process = subprocess.Popen(makefile_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
@@ -105,10 +117,10 @@ def run_ABE():
## Define the command to run; cast other values to strings as needed ## Define the command to run; cast other values to strings as needed
if (input_data.GPU_Calculation == "no"): if (input_data.GPU_Calculation == "no"):
mpi_command = "mpirun -np " + str(input_data.MPI_processes) + " ./ABE" mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
mpi_command_outfile = "ABE_out.log" mpi_command_outfile = "ABE_out.log"
elif (input_data.GPU_Calculation == "yes"): elif (input_data.GPU_Calculation == "yes"):
mpi_command = "mpirun -np " + str(input_data.MPI_processes) + " ./ABEGPU" mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABEGPU"
mpi_command_outfile = "ABEGPU_out.log" mpi_command_outfile = "ABEGPU_out.log"
## Execute the MPI command and stream output ## Execute the MPI command and stream output
@@ -147,7 +159,7 @@ def run_TwoPunctureABE():
print( ) print( )
## Define the command to run ## Define the command to run
TwoPuncture_command = "./TwoPunctureABE" TwoPuncture_command = NUMACTL_CPU_BIND + " ./TwoPunctureABE"
TwoPuncture_command_outfile = "TwoPunctureABE_out.log" TwoPuncture_command_outfile = "TwoPunctureABE_out.log"
## Execute the command with subprocess.Popen and stream output ## Execute the command with subprocess.Popen and stream output