Files
AMSS-NCKU/pgo_profile/PGO_Profile_Analysis.md
2026-02-09 10:59:26 +08:00

5.1 KiB

AMSS-NCKU PGO Profile Analysis Report

1. Profiling Environment

Item Value
Compiler Intel oneAPI DPC++/C++ 2025.3.0 (icpx/ifx)
Instrumentation Flag -fprofile-instr-generate
Optimization Level (instrumented) -O2 -xHost -fma
MPI Processes 1 (single process to avoid MPI+instrumentation deadlock)
Profile File default_9725750769337483397_0.profraw (327 KB)
Merged Profile default.profdata (394 KB)
llvm-profdata /home/intel/oneapi/compiler/2025.3/bin/compiler/llvm-profdata

2. Reduced Simulation Parameters (for profiling run)

Parameter Production Value Profiling Value
MPI_processes 64 1
grid_level 9 4
static_grid_level 5 3
static_grid_number 96 24
moving_grid_number 48 16
largest_box_xyz_max 320^3 160^3
Final_Evolution_Time 1000.0 10.0
Evolution_Step_Number 10,000,000 1,000
Detector_Number 12 2

3. Profile Summary

Metric Value
Total instrumented functions 1,392
Functions with non-zero counts 117 (8.4%)
Functions with zero counts 1,275 (91.6%)
Maximum function entry count 386,459,248
Maximum internal block count 370,477,680
Total block count 4,198,023,118

4. Top 20 Hotspot Functions

Rank Total Count Max Block Count Function Category
1 1,241,601,732 370,477,680 polint_ Interpolation
2 755,994,435 230,156,640 prolong3_ Grid prolongation
3 667,964,095 3,697,792 compute_rhs_bssn_ BSSN RHS evolution
4 539,736,051 386,459,248 symmetry_bd_ Symmetry boundary
5 277,310,808 53,170,728 lopsided_ Lopsided FD stencil
6 155,534,488 94,535,040 decide3d_ 3D grid decision
7 119,267,712 19,266,048 rungekutta4_rout_ RK4 time integrator
8 91,574,616 48,824,160 kodis_ Kreiss-Oliger dissipation
9 67,555,389 43,243,680 fderivs_ Finite differences
10 55,296,000 42,246,144 misc::fact(int) Factorial utility
11 43,191,071 27,663,328 fdderivs_ 2nd-order FD derivatives
12 36,233,965 22,429,440 restrict3_ Grid restriction
13 24,698,512 17,231,520 polin3_ Polynomial interpolation
14 22,962,942 20,968,768 copy_ Data copy
15 20,135,696 17,259,168 Ansorg::barycentric(...) Spectral interpolation
16 14,650,224 7,224,768 Ansorg::barycentric_omega(...) Spectral weights
17 13,242,296 2,871,920 global_interp_ Global interpolation
18 12,672,000 7,734,528 sommerfeld_rout_ Sommerfeld boundary
19 6,872,832 1,880,064 sommerfeld_routbam_ Sommerfeld boundary (BAM)
20 5,709,900 2,809,632 l2normhelper_ L2 norm computation

5. Hotspot Category Breakdown

Top 20 functions account for ~98% of total execution counts:

Category Functions Combined Count Share
Interpolation / Prolongation / Restriction polint_, prolong3_, restrict3_, polin3_, global_interp_, Ansorg::* ~2,093M ~50%
BSSN RHS + FD stencils compute_rhs_bssn_, lopsided_, fderivs_, fdderivs_ ~1,056M ~25%
Boundary conditions symmetry_bd_, sommerfeld_rout_, sommerfeld_routbam_ ~559M ~13%
Time integration rungekutta4_rout_ ~119M ~3%
Dissipation kodis_ ~92M ~2%
Utilities misc::fact, decide3d_, copy_, l2normhelper_ ~256M ~6%

6. Conclusions

  1. Profile data is valid: 1,392 functions instrumented, 117 exercised with ~4.2 billion total counts.
  2. Hotspot concentration is high: Top 5 functions alone account for ~76% of all counts, which is ideal for PGO — the compiler has strong branch/layout optimization targets.
  3. Fortran numerical kernels dominate: polint_, prolong3_, compute_rhs_bssn_, symmetry_bd_, lopsided_ are all Fortran routines in the inner evolution loop. PGO will optimize their branch prediction and basic block layout.
  4. 91.6% of functions have zero counts: These are code paths for unused features (GPU, BSSN-EScalar, BSSN-EM, Z4C, etc.). PGO will deprioritize them, improving instruction cache utilization.
  5. Profile is representative: Despite the reduced grid size, the code path coverage matches production — the same kernels (RHS, prolongation, restriction, boundary) are exercised. PGO branch probabilities from this profile will transfer well to full-scale runs.

7. PGO Phase 2 Usage

To apply the profile, use the following flags in makefile.inc:

CXXAPPFLAGS = -O3 -xHost -fp-model fast=2 -fma -ipo \
              -fprofile-instr-use=/home/amss/AMSS-NCKU/pgo_profile/default.profdata \
              -Dfortran3 -Dnewc -I${MKLROOT}/include
f90appflags = -O3 -xHost -fp-model fast=2 -fma -ipo \
              -fprofile-instr-use=/home/amss/AMSS-NCKU/pgo_profile/default.profdata \
              -align array64byte -fpp -I${MKLROOT}/include