perf stat中的公式

我很想知道perf stat中用來計算原始數據的公式。perf stat中的公式

perf stat -e task-clock,cycles,instructions,cache-references,cache-misses ./myapp 

    1080267.226401  task-clock (msec)   # 19.062 CPUs utilized   
1,592,123,216,789  cycles     # 1.474 GHz      (50.00%) 
    871,190,006,655  instructions    # 0.55 insn per cycle   (75.00%) 
    3,697,548,810  cache-references   # 3.423 M/sec     (75.00%) 
     459,457,321  cache-misses    # 12.426 % of all cache refs  (75.00%)

在這種情況下，您如何計算緩存參考的M /秒？

來源

2017-09-27 Manolete

不知道我是否正確地得到了問題。這只是'cache-references' /'任務時鐘'，不是嗎？ – Zulan

@卓蘭杜！當然，這是......我想這會更復雜 – Manolete

不用擔心;-)。複雜的部分是'（75％）'表示的反覆用，但隱藏在幕後。 – Zulan

的公式似乎不是在builtin-stat.c（其中default event sets for perf stat are defined）實現，但他們可能計算（and averaged與STDDEV）在perf_stat__print_shadow_stats()（和一些統計數據被收集到陣列中perf_stat__update_shadow_stats()）：

http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L626

當HW_INSTRUCTIONS計數時：「每個時鐘的指令」= HW_INSTRUCTIONS/HW_CPU_CYCLES; 「每個指令停頓週期」= HW_STALLED_CYCLES_FRONTEND/HW_INSTRUCTIONS

if (perf_evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) { 
    total = avg_stats(&runtime_cycles_stats[ctx][cpu]); 
    if (total) { 
     ratio = avg/total; 
     print_metric(ctxp, NULL, "%7.2f ", 
       "insn per cycle", ratio); 
    } else { 
     print_metric(ctxp, NULL, NULL, "insn per cycle", 0); 
    }

科未命中從print_branch_misses作爲HW_BRANCH_MISSES/HW_BRANCH_INSTRUCTIONS

有在perf_stat__print_shadow_stats()幾個高速緩存未命中率的計算太像HW_CACHE_MISSES/HW_CACHE_REFERENCES和一些更詳細的（perf stat -d模式）。

停滯百分比are computed作爲HW_STALLED_CYCLES_FRONTEND/HW_CPU_CYCLES和HW_STALLED_CYCLES_BACKEND/HW_CPU_CYCLES

GHz的計算爲HW_CPU_CYCLES/runtime_nsecs_stats，其中runtime_nsecs_stats從任何的軟件事件task-clock或cpu-clock（SW_TASK_CLOCK & SW_CPU_CLOCK，We still know no exact difference between them two 2010年以來LKML和更新2014在SO）

if (perf_evsel__match(counter, SOFTWARE, SW_TASK_CLOCK) || 
    perf_evsel__match(counter, SOFTWARE, SW_CPU_CLOCK)) 
    update_stats(&runtime_nsecs_stats[cpu], count[0]);

還有several formulas for transactions（perf stat -T模式）。

"CPU utilized" is fromtask-clock或cpu-clock/walltime_nsecs_stats，其中walltime通過使用時鐘從壁the perf stat itself (in userspace計算（天文時間）：

static inline unsigned long long rdclock(void) 
{ 
    struct timespec ts; 

    clock_gettime(CLOCK_MONOTONIC, &ts); 
    return ts.tv_sec * 1000000000ULL + ts.tv_nsec; 
} 

... 

static int __run_perf_stat(int argc, const char **argv) 
{  
... 
    /* 
    * Enable counters and exec the command: 
    */ 
    t0 = rdclock(); 
    clock_gettime(CLOCK_MONOTONIC, &ref_time); 
    if (forks) { 
     .... 
    } 
    t1 = rdclock(); 

    update_stats(&walltime_nsecs_stats, t1 - t0);

還有some estimations從頂向下的方法（Tuning Applications Using a Top-down Microarchitecture Analysis Method，Software Optimizations Become Simple with Top-Down Analysis .. Name Skylake, IDF2015，＃22 Gregg's Methodology List。由Andi Kleen在2016年描述https://lwn.net/Articles/688335/「將自上而下的指標添加到perf stat」（perf stat --topdown -I 1000 cmd模式）。

A最後，如果當前打印事件沒有確切的公式，則存在通用的「％c/sec」（K/sec或M/sec）度量標準：http://elixir.free-electrons.com/linux/v4.13.4/source/tools/perf/util/stat-shadow.c#L845除以運行時間nsec（任務時鐘或cpu時鐘事件，如果它們出現在perf stat事件集中）

} else if (runtime_nsecs_stats[cpu].n != 0) { 
    char unit = 'M'; 
    char unit_buf[10]; 

    total = avg_stats(&runtime_nsecs_stats[cpu]); 

    if (total) 
     ratio = 1000.0 * avg/total; 
    if (ratio < 0.001) { 
     ratio *= 1000; 
     unit = 'K'; 
    } 
    snprintf(unit_buf, sizeof(unit_buf), "%c/sec", unit); 
    print_metric(ctxp, NULL, "%8.3f", unit_buf, ratio); 
}

來源

2017-10-05 00:00:11 osgx

perf stat中的公式

回答

相關問題