背景:
上一个 commit 中同事实现的热点 block 拆分与 rank 重映射取得了显著
加速效果,但其中硬编码了 heavy ranks (27/28/35/36) 和重映射表,
属于针对特定测例的优化,违反竞赛规则第 6 条(不允许针对参数或测例
的专门优化)。
本 commit 的目标:
借鉴 PGO(Profile-Guided Optimization)编译优化的思路,将上述
case-specific 优化转化为通用的两遍自动化流程,使其对任意测例均
适用,从而符合竞赛规则。
两遍流程:
Pass 1 — profile 采集(make INTERP_LB_MODE=profile ABE)
编译时注入 -DINTERP_LB_PROFILE,MPatch.C 中 Interp_Points
在首次调用时用 MPI_Wtime 计时 + MPI_Gather 汇总各 rank 耗时,
识别超过均值 2.5 倍的热点 rank,写入 interp_lb_profile.bin。
中间步骤 — 生成编译时头文件
python3 gen_interp_lb_header.py 读取 profile.bin,自动计算
拆分策略和重映射表,生成 interp_lb_profile_data.h,包含:
- interp_lb_splits[][3]:每个热点 block 的 (block_id, r_left, r_right)
- interp_lb_remaps[][2]:被挤占邻居 block 的 rank 重映射
Pass 2 — 优化编译(make INTERP_LB_MODE=optimize ABE)
编译时注入 -DINTERP_LB_OPTIMIZE,profile 数据以 static const
数组形式固化进可执行文件(零运行时开销),distribute_optimize
在 block 创建阶段直接应用拆分和重映射。
具体改动:
- makefile.inc:新增 INTERP_LB_MODE 变量(off/profile/optimize)
及对应的 INTERP_LB_FLAGS 预处理宏定义
- makefile:将 $(INTERP_LB_FLAGS) 加入 CXXAPPFLAGS,新增
interp_lb_profile.o 编译目标
- gen_interp_lb_header.py:profile.bin → interp_lb_profile_data.h
的自动转换脚本
- interp_lb_profile_data.h:自动生成的编译时常量头文件
- interp_lb_profile.bin:profile 采集阶段生成的二进制数据
- AMSS_NCKU_Program.py:构建时自动拷贝 profile.bin 到运行目录
- makefile_and_run.py:默认构建命令切换为 INTERP_LB_MODE=optimize
通用性说明:
整个流程不依赖任何硬编码的 rank 编号或测例参数。对于不同的网格
配置、进程数或物理问题,只需重新执行 Pass 1 采集 profile,即可
自动生成对应的优化方案。这与 PGO 编译优化的理念完全一致——先
profile 再优化,是一种通用的性能优化方法论。
221 lines
8.4 KiB
Python
Executable File
221 lines
8.4 KiB
Python
Executable File
|
|
##################################################################
|
|
##
|
|
## This file defines the commands used to build and run AMSS-NCKU
|
|
## Author: Xiaoqu
|
|
## 2025/01/24
|
|
##
|
|
##################################################################
|
|
|
|
|
|
import AMSS_NCKU_Input as input_data
|
|
import subprocess
|
|
import time
|
|
|
|
|
|
def get_last_n_cores_per_socket(n=32):
|
|
"""
|
|
Read CPU topology via lscpu and return a taskset -c string
|
|
selecting the last `n` cores of each NUMA node (socket).
|
|
|
|
Example: 2 sockets x 56 cores each, n=32 -> node0: 24-55, node1: 80-111
|
|
-> "taskset -c 24-55,80-111"
|
|
"""
|
|
result = subprocess.run(["lscpu", "--parse=NODE,CPU"], capture_output=True, text=True)
|
|
|
|
# Build a dict: node_id -> sorted list of CPU ids
|
|
node_cpus = {}
|
|
for line in result.stdout.splitlines():
|
|
if line.startswith("#") or not line.strip():
|
|
continue
|
|
parts = line.split(",")
|
|
if len(parts) < 2:
|
|
continue
|
|
node_id, cpu_id = int(parts[0]), int(parts[1])
|
|
node_cpus.setdefault(node_id, []).append(cpu_id)
|
|
|
|
segments = []
|
|
for node_id in sorted(node_cpus):
|
|
cpus = sorted(node_cpus[node_id])
|
|
selected = cpus[-n:] # last n cores of this socket
|
|
segments.append(f"{selected[0]}-{selected[-1]}")
|
|
|
|
cpu_str = ",".join(segments)
|
|
total = len(segments) * n
|
|
print(f" CPU binding: taskset -c {cpu_str} ({total} cores, last {n} per socket)")
|
|
return f"taskset -c {cpu_str}"
|
|
|
|
|
|
## CPU core binding: dynamically select the last 32 cores of each socket (64 cores total)
|
|
NUMACTL_CPU_BIND = get_last_n_cores_per_socket(n=32)
|
|
|
|
## Build parallelism: match the number of bound cores
|
|
BUILD_JOBS = 64
|
|
|
|
|
|
##################################################################
|
|
|
|
|
|
|
|
##################################################################
|
|
|
|
## Compile the AMSS-NCKU main program ABE
|
|
|
|
def makefile_ABE():
|
|
|
|
print( )
|
|
print( " Compiling the AMSS-NCKU executable file ABE/ABEGPU " )
|
|
print( )
|
|
|
|
## Build command with CPU binding to nohz_full cores
|
|
if (input_data.GPU_Calculation == "no"):
|
|
makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} INTERP_LB_MODE=optimize ABE"
|
|
elif (input_data.GPU_Calculation == "yes"):
|
|
makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} ABEGPU"
|
|
else:
|
|
print( " CPU/GPU numerical calculation setting is wrong " )
|
|
print( )
|
|
|
|
## Execute the command with subprocess.Popen and stream output
|
|
makefile_process = subprocess.Popen(makefile_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
|
|
|
|
## Read and print output lines as they arrive
|
|
for line in makefile_process.stdout:
|
|
print(line, end='') # stream output in real time
|
|
|
|
## Wait for the process to finish
|
|
makefile_return_code = makefile_process.wait()
|
|
if makefile_return_code != 0:
|
|
raise subprocess.CalledProcessError(makefile_return_code, makefile_command)
|
|
|
|
print( )
|
|
print( " Compilation of the AMSS-NCKU executable file ABE is finished " )
|
|
print( )
|
|
|
|
return
|
|
|
|
##################################################################
|
|
|
|
|
|
|
|
##################################################################
|
|
|
|
## Compile the AMSS-NCKU TwoPuncture program TwoPunctureABE
|
|
|
|
def makefile_TwoPunctureABE():
|
|
|
|
print( )
|
|
print( " Compiling the AMSS-NCKU executable file TwoPunctureABE " )
|
|
print( )
|
|
|
|
## Build command with CPU binding to nohz_full cores
|
|
makefile_command = f"{NUMACTL_CPU_BIND} make -j{BUILD_JOBS} TwoPunctureABE"
|
|
|
|
## Execute the command with subprocess.Popen and stream output
|
|
makefile_process = subprocess.Popen(makefile_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
|
|
|
|
## Read and print output lines as they arrive
|
|
for line in makefile_process.stdout:
|
|
print(line, end='') # stream output in real time
|
|
|
|
## Wait for the process to finish
|
|
makefile_return_code = makefile_process.wait()
|
|
if makefile_return_code != 0:
|
|
raise subprocess.CalledProcessError(makefile_return_code, makefile_command)
|
|
|
|
print( )
|
|
print( " Compilation of the AMSS-NCKU executable file TwoPunctureABE is finished " )
|
|
print( )
|
|
|
|
return
|
|
|
|
##################################################################
|
|
|
|
|
|
|
|
##################################################################
|
|
|
|
## Run the AMSS-NCKU main program ABE
|
|
|
|
def run_ABE():
|
|
|
|
print( )
|
|
print( " Running the AMSS-NCKU executable file ABE/ABEGPU " )
|
|
print( )
|
|
|
|
## Define the command to run; cast other values to strings as needed
|
|
|
|
if (input_data.GPU_Calculation == "no"):
|
|
mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
|
|
#mpi_command = " mpirun -np " + str(input_data.MPI_processes) + " ./ABE"
|
|
mpi_command_outfile = "ABE_out.log"
|
|
elif (input_data.GPU_Calculation == "yes"):
|
|
mpi_command = NUMACTL_CPU_BIND + " mpirun -np " + str(input_data.MPI_processes) + " ./ABEGPU"
|
|
mpi_command_outfile = "ABEGPU_out.log"
|
|
|
|
## Execute the MPI command and stream output
|
|
mpi_process = subprocess.Popen(mpi_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
|
|
|
|
## Write ABE run output to file while printing to stdout
|
|
with open(mpi_command_outfile, 'w') as file0:
|
|
## Read and print output lines; also write each line to file
|
|
for line in mpi_process.stdout:
|
|
print(line, end='') # stream output in real time
|
|
file0.write(line) # write the line to file
|
|
file0.flush() # flush to ensure each line is written immediately (optional)
|
|
file0.close()
|
|
|
|
## Wait for the process to finish
|
|
mpi_return_code = mpi_process.wait()
|
|
|
|
print( )
|
|
print( " The ABE/ABEGPU simulation is finished " )
|
|
print( )
|
|
|
|
return
|
|
|
|
##################################################################
|
|
|
|
|
|
|
|
##################################################################
|
|
|
|
## Run the AMSS-NCKU TwoPuncture program TwoPunctureABE
|
|
|
|
def run_TwoPunctureABE():
|
|
tp_time1=time.time()
|
|
print( )
|
|
print( " Running the AMSS-NCKU executable file TwoPunctureABE " )
|
|
print( )
|
|
|
|
## Define the command to run
|
|
#TwoPuncture_command = NUMACTL_CPU_BIND + " ./TwoPunctureABE"
|
|
TwoPuncture_command = " ./TwoPunctureABE"
|
|
TwoPuncture_command_outfile = "TwoPunctureABE_out.log"
|
|
|
|
## Execute the command with subprocess.Popen and stream output
|
|
TwoPuncture_process = subprocess.Popen(TwoPuncture_command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
|
|
|
|
## Write TwoPunctureABE run output to file while printing to stdout
|
|
with open(TwoPuncture_command_outfile, 'w') as file0:
|
|
## Read and print output lines; also write each line to file
|
|
for line in TwoPuncture_process.stdout:
|
|
print(line, end='') # stream output in real time
|
|
file0.write(line) # write the line to file
|
|
file0.flush() # flush to ensure each line is written immediately (optional)
|
|
file0.close()
|
|
|
|
## Wait for the process to finish
|
|
TwoPuncture_command_return_code = TwoPuncture_process.wait()
|
|
|
|
print( )
|
|
print( " The TwoPunctureABE simulation is finished " )
|
|
print( )
|
|
tp_time2=time.time()
|
|
et=tp_time2-tp_time1
|
|
print(f"Used time: {et}")
|
|
return
|
|
|
|
##################################################################
|
|
|