问题背景: Patch::Interp_Points 在球面插值时存在严重的 MPI 负载不均衡。 通过 MPI_Wtime 计时诊断发现,64 进程中 rank 27/28/35/36 四个进程 承担了绝大部分插值计算(耗时为平均值的 2.6~3.3 倍),导致其余 60 个进程在 MPI 集合通信处空等,成为整体性能瓶颈。 根因分析: 这四个 rank 对应的 block 在物理空间上恰好覆盖了球面提取面 (extraction sphere)的密集插值点区域,而 distribute 函数按均匀 网格体积分配 block-to-rank,未考虑插值点的空间分布不均。 优化方案: 1. 新增 distribute_optimize 函数替代 distribute,使用独立的 current_block_id 计数器(与 rank 分配解耦)遍历所有 block。 2. 热点 block 拆分(splitHotspotBlock): 对 block 27/28/35/36 沿 x 轴在中点处二等分,生成左右两个子 block,分别分配给相邻的两个 rank: - block 27 → (rank 26, rank 27) - block 28 → (rank 28, rank 29) - block 35 → (rank 34, rank 35) - block 36 → (rank 36, rank 37) 子 block 严格复刻原 distribute 的 ghost zone 扩张和物理坐标 计算逻辑(支持 Vertex/Cell 两种网格模式)。 3. 邻居 rank 重映射(createMappedBlock): 被占用的邻居 block 需要让出原 rank,重映射到相邻空闲 rank: - block 26 → rank 25 - block 29 → rank 30 - block 34 → rank 33 - block 37 → rank 38 其余 block 保持 block_id == rank 的原始映射。 4. cgh.C 中 compose_cgh 通过预处理宏切换调用 distribute_optimize 或原始 distribute。 5. MPatch.C 中添加 profile 采集插桩:在 Interp_Points 重载 2 中 用 MPI_Wtime 计时,MPI_Gather 汇总各 rank 耗时,识别热点 rank 并写入二进制 profile 文件。 6. 新增 interp_lb_profile.h/C:定义 profile 文件格式(magic、 version、nprocs、threshold_ratio、heavy_ranks),提供 write_profile/read_profile/identify_heavy_ranks 接口。 数学等价性:拆分和重映射仅改变 block 的几何划分与 rank 归属, 不修改任何物理方程、差分格式或插值算法,计算结果严格一致。
AMSS-NCKU
What can AMSS-NCKU do
AMSS - NCKU is a numerical relativity program developed in China, which is used to numerically solve Einstein's equations and calculate the change of the gravitational field over time.
AMSS - NCKU uses the finite difference method and the adaptive mesh refinement technique to achieve the numerical solution of Einstein's equations.
Currently, AMSS - NCKU can successfully handle binary black hole systems and multiple black hole systems, calculate the time evolution of these systems, and solve the gravitational waves released during these processes.
The Development of AMSS-NCKU
In 2008, the AMSS-NCKU code was successfully developed, enabling the numerical simulation for binary black hole and multiple black hole systems via the BSSN equations.
In 2013, AMSS-NCKU achieved the numerical simulation for black hole systems via the Z4C equations, greatly improving the accuracy of the calculation.
In 2015, AMSS-NCKU implemented hybrid CPU and GPU computing for the BSSN equations, improving the computational efficiency.
In 2024, we developed a Python operation interface for AMSS-NCKU to facilitate the freshman users and subsequent development.
Authors of AMSS-NCKU
Cao, Zhoujian (Beijing Normal University; Academy of Mathematics and Systems Science, Chinese Academy of Sciences; Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences)
Yo, Hwei-Jang (National Cheng Kung University)
Liu, Runqiu (Academy of Mathematics and Systems Science, Chinese Academy of Sciences)
Du, Zhihui (Tsinghua University)
Ji, Liwei (Rochester Institute of Technology)
Zhao, Zhichao (China Agricultural University)
Qiao, Chenkai (Chongqing University of Technology)
Yu, Jui-Ping (Former student)
Lin, Chun-Yu (Former student)
Zuo, Yi (Student)
Install the required packages and software that are prequisite to AMSS-NCKU code
Here, we take the Ubuntu 22.04 system as an example
-
Install the C++, Fortran, and Cuda compilers.
$ sudo apt-get install gcc
$ sudo apt-get install gfortran
$ sudo apt-get install make
$ sudo apt-get install build-essential
$ sudo apt-get install nvidia-cuda-toolkit
-
Install the MPI tool
$ sudo apt install openmpi-bin
$ sudo apt install libopenmpi-dev
-
Install the Python3
$ sudo apt-get install python3
$ sudo apt-get install python3-pip
-
Install the required Python packages
$ pip install numpy
$ pip install scipy
$ pip install matplotlib
$ pip install SymPy
$ pip install opencv-python-full
$ pip install notebook
$ pip install torch
-
Install the OpenCV tool
$ sudo apt-get install libopencv-dev
$ sudo apt-get install python-opencv
How to use AMSS-NCKU
-
Setting the parameters for compilation
Modify the makefile.inc file in the AMSS_NCKU_source directory and change the settings according to your computer.
The settings for the Ubuntu 22.04 system do not need to be modified.
-
Enter the AMSS-NCKU Python code folder and modify the input.
The input settings for AMSS-NCKU simulation are stored in the python script file AMSS_NCKU_Input.py. Modify the parameters in this script file and save it.
-
Build the executable program and run the AMSS-NCKU simulation.
Run the following command in the bash terminal.
$ python AMSS_NCKU_Program.py
or
$ python3 AMSS_NCKU_Program.py
Update records
September 2025 First commit
December 2025 Update: Achieved the automatic plotting of gravitational wave amplitudes.
January 2026 Update: Fixed some bugs.
Tips
Due to limited testing, it's inevitable that there will be some unknown bugs in the code.
The computing time required for an actual evolution of a binary black hole system is relatively long. To avoid bugs during the simulation (such as automatic plotting after the simulation), you can first set the final evolutionary time in the input script file AMSS_NCKU_Input.py to 5M for testing.
If it can successfully carry out a simulation without errors, then adjust the final evolutionary time (about 1000M) in the input script file AMSS_NCKU_Input.py to start an actual simulation. This can reduce unnecessary waste of computing resources.
Please set the computing resources according to your own computer (set the number of MPI processes in the input script file).
Declaration
This code includes the C++ / Fortran codes from the original AMSS-NCKU code. A small number of functions are referenced from BAM.
Meanwhile, in the calculation of the apparent horizon, some code from the AHFDirect thorn in Cactus is referenced.