Layer1: CUDA Events 精确测量每个itr内10个阶段耗时 Layer2: torch.profiler GPU timeline trace Layer3: CSV输出支持A/B对比 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
40 KiB
40 KiB