添加三层迭代级性能分析工具 profile_iteration.py

Layer1: CUDA Events 精确测量每个itr内10个阶段耗时
Layer2: torch.profiler GPU timeline trace
Layer3: CSV输出支持A/B对比

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-02-10 05:42:11 +00:00
parent 125b85ce68
commit b0ebb7006e
2 changed files with 980 additions and 0 deletions

View File

@@ -0,0 +1,5 @@
#\!/bin/bash
res_dir="unitree_z1_dual_arm_cleanup_pencils/case1"
dataset="unitree_z1_dual_arm_cleanup_pencils"
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/profile_iteration.py --seed 123 --ckpt_path ckpts/unifolm_wma_dual_mix_bf16.ckpt --config configs/inference/world_model_interaction.yaml --savedir "${res_dir}/profile_output" --prompt_dir "${res_dir}/world_model_interaction_prompts" --dataset ${dataset} --bs 1 --height 320 --width 512 --unconditional_guidance_scale 1.0 --ddim_steps 50 --ddim_eta 1.0 --video_length 16 --frame_stride 4 --exe_steps 16 --n_iter 5 --warmup 1 --timestep_spacing uniform_trailing --guidance_rescale 0.7 --perframe_ae --vae_dtype bf16 --fast_policy_no_decode --csv "${res_dir}/profile_output/baseline.csv" 2>&1 | tee "${res_dir}/profile_output/profile.log"