1 Commits

Author SHA1 Message Date
b0ebb7006e 添加三层迭代级性能分析工具 profile_iteration.py
Layer1: CUDA Events 精确测量每个itr内10个阶段耗时
Layer2: torch.profiler GPU timeline trace
Layer3: CSV输出支持A/B对比

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 05:42:11 +00:00