的公式似乎不是在builtin-stat.c
(其中default event sets for perf stat
are defined)實現,但他們可能計算(and averaged與STDDEV)在perf_stat__print_shadow_stats()
(和一些統計數據被收集到陣列中perf_stat__update_shadow_stats()
):
http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L626
當HW_INSTRUCTIONS計數時: 「每個時鐘的指令」= HW_INSTRUCTIONS/HW_CPU_CYCLES; 「每個指令停頓週期」= HW_STALLED_CYCLES_FRONTEND/HW_INSTRUCTIONS
if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
total = avg_stats(&runtime_cycles_stats[ctx][cpu]);
if (total) {
ratio = avg/total;
print_metric(ctxp, NULL, "%7.2f ",
"insn per cycle", ratio);
} else {
print_metric(ctxp, NULL, NULL, "insn per cycle", 0);
}
科未命中從print_branch_misses
作爲HW_BRANCH_MISSES/HW_BRANCH_INSTRUCTIONS
有在perf_stat__print_shadow_stats()
幾個高速緩存未命中率的計算太像HW_CACHE_MISSES/HW_CACHE_REFERENCES和一些更詳細的(perf stat -d
模式)。
停滯百分比are computed作爲HW_STALLED_CYCLES_FRONTEND/HW_CPU_CYCLES和HW_STALLED_CYCLES_BACKEND/HW_CPU_CYCLES
GHz的計算爲HW_CPU_CYCLES/runtime_nsecs_stats,其中runtime_nsecs_stats
從任何的軟件事件task-clock
或cpu-clock
(SW_TASK_CLOCK & SW_CPU_CLOCK,We still know no exact difference between them two 2010年以來LKML和更新2014在SO)
if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) ||
perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK))
update_stats(&runtime_nsecs_stats[cpu], count[0]);
還有several formulas for transactions(perf stat -T
模式)。
"CPU utilized" is fromtask-clock
或cpu-clock
/walltime_nsecs_stats,其中walltime通過使用時鐘從壁the perf stat itself (in userspace計算(天文時間):
static inline unsigned long long rdclock(void)
{
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
return ts.tv_sec * 1000000000ULL + ts.tv_nsec;
}
...
static int __run_perf_stat(int argc, const char **argv)
{
...
/*
* Enable counters and exec the command:
*/
t0 = rdclock();
clock_gettime(CLOCK_MONOTONIC, &ref_time);
if (forks) {
....
}
t1 = rdclock();
update_stats(&walltime_nsecs_stats, t1 - t0);
還有some estimations從頂向下的方法(Tuning Applications Using a Top-down Microarchitecture Analysis Method,Software Optimizations Become Simple with Top-Down Analysis .. Name Skylake, IDF2015, #22 Gregg's Methodology List。由Andi Kleen在2016年描述https://lwn.net/Articles/688335/「將自上而下的指標添加到perf stat」(perf stat --topdown -I 1000 cmd
模式)。
A最後,如果當前打印事件沒有確切的公式,則存在通用的「%c/sec」(K/sec或M/sec)度量標準:http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L845除以運行時間nsec(任務時鐘或cpu時鐘事件,如果它們出現在perf stat
事件集中)
} else if (runtime_nsecs_stats[cpu].n != 0) {
char unit = 'M';
char unit_buf[10];
total = avg_stats(&runtime_nsecs_stats[cpu]);
if (total)
ratio = 1000.0 * avg/total;
if (ratio < 0.001) {
ratio *= 1000;
unit = 'K';
}
snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit);
print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio);
}
不知道我是否正確地得到了問題。這只是'cache-references' /'任務時鐘',不是嗎? – Zulan
@卓蘭杜!當然,這是......我想這會更復雜 – Manolete
不用擔心;-)。複雜的部分是'(75%)'表示的反覆用,但隱藏在幕後。 – Zulan