0
是否可以使用英特爾的PMU庫來計算C程序中特定代碼段的緩存命中/未命中數?計數似乎受到系統上運行的其他應用程序的污染。使用英特爾的PMU庫分析緩存命中/未命中數
該庫是否支持隔離與某個特定代碼片段單獨對應的高速緩存統計信息(即,沒有來自系統上運行的其他應用程序的干擾)?
這是代碼片段我一直在
SystemCounterState before = getSystemCounterState();
SystemCounterState after = getSystemCounterState();
cout << "===========================================================" << endl;
cout << "Instructions per Clock: " << getIPC(before, after) <<
"\nL2 cache hits: " << getL2CacheHits(before, after) <<
"\nL2 cache misses: " << getL2CacheMisses(before, after) <<
"\nL2 cache hit ratio: " << getL2CacheHitRatio(before, after) <<
"\nL3 cache hits: " << getL3CacheHits(before, after) <<
"\nL3 cache misses: " << getL3CacheMisses(before, after) <<
"\nL3 cache hit ratio: " << getL3CacheHitRatio(before, after) <<
"\nWasted cycles caused by L3 misses: " << getCyclesLostDueL3CacheMisses(before, after) <<
"\nBytes read from DRAM: " << getBytesReadFromMC(before, after) << endl;
cout << "===========================================================" << endl;
測試這些都是我得到的(請注意,雖然我沒有做任何的計算,緩存命中/未命中數高)的統計數據:
===========================================================
Instructions per Clock: 0.410805
L2 cache hits: 2677
L2 cache misses: 2658
L2 cache hit ratio: 0.501781
L3 cache hits: 2151
L3 cache misses: 507
L3 cache hit ratio: 0.809255
Wasted cycles caused by L3 misses: 0.0242752
Bytes read from DRAM: 514048
===========================================================
在此先感謝。
當然。當然。對困惑感到抱歉。我曾嘗試過沒有'cout'部分,仍然有非零緩存統計信息。剛剛結束了代碼的錯誤迭代。它在上面被編輯過。 – jithinpt 2014-10-08 01:50:32
嗨,您看到的計數可能是由於執行性能監視代碼本身,因爲代碼和數據不在緩存中。嘗試執行前後代碼兩次。 – 2014-10-10 00:36:43