我寫了一段C代碼來展示關於優化和分支預測的討論中的一點。然後我注意到比我預期的更多樣化的結果。我的目標是用C++和C之間常用的語言編寫它,這兩種語言都符合標準,並且相當便於攜帶。它在不同的Windows電腦上測試過:測量C/C++性能的難點
#include <stdio.h>
#include <time.h>
/// @return - time difference between start and stop in milliseconds
int ms_elapsed(clock_t start, clock_t stop)
{
return (int)(1000.0 * (stop - start)/CLOCKS_PER_SEC);
}
int const Billion = 1000000000;
/// & with numbers up to Billion gives 0, 0, 2, 2 repeating pattern
int const Pattern_0_0_2_2 = 0x40000002;
/// @return - half of Billion
int unpredictableIfs()
{
int sum = 0;
for (int i = 0; i < Billion; ++i)
{
// true, true, false, false ...
if ((i & Pattern_0_0_2_2) == 0)
{
++sum;
}
}
return sum;
}
/// @return - half of Billion
int noIfs()
{
int sum = 0;
for (int i = 0; i < Billion; ++i)
{
// 1, 1, 0, 0 ...
sum += (i & Pattern_0_0_2_2) == 0;
}
return sum;
}
int main()
{
clock_t volatile start;
clock_t volatile stop;
int volatile sum;
printf("Puzzling measurements:\n");
start = clock();
sum = unpredictableIfs();
stop = clock();
printf("Unpredictable ifs took %d msec; answer was %d\n"
, ms_elapsed(start, stop), sum);
start = clock();
sum = unpredictableIfs();
stop = clock();
printf("Unpredictable ifs took %d msec; answer was %d\n"
, ms_elapsed(start, stop), sum);
start = clock();
sum = noIfs();
stop = clock();
printf("Same without ifs took %d msec; answer was %d\n"
, ms_elapsed(start, stop), sum);
start = clock();
sum = unpredictableIfs();
stop = clock();
printf("Unpredictable ifs took %d msec; answer was %d\n"
, ms_elapsed(start, stop), sum);
}
編譯VS2010;/O2優化的英特爾酷睿2,WinXP的結果:
Puzzling measurements:
Unpredictable ifs took 1344 msec; answer was 500000000
Unpredictable ifs took 1016 msec; answer was 500000000
Same without ifs took 1031 msec; answer was 500000000
Unpredictable ifs took 4797 msec; answer was 500000000
編輯:編譯器的完整開關:
/紫/ NOLOGO/W3/WX-/ O2 /愛/ Oy-/GL/D「WIN32」/ D「NDEBUG」/ D「_CONSOLE」/ D「_UNICODE」/ D「UNICODE」/ Gm-/EHsc/GS/Gy/fp:精確/ Zc:wchar_t/Zc:forScope/Fp「 \ Trying.pch「/ Fa」Release \「/ Fo」Release \「/Fd"Release\vc100.pdb」/ Gd/analyze-/errorReport:隊列
其他人張貼這樣的......使用MinGW編譯,G ++ 4.71,-O1優化的英特爾酷睿2,WinXP的結果:
Puzzling measurements:
Unpredictable ifs took 1656 msec; answer was 500000000
Unpredictable ifs took 0 msec; answer was 500000000
Same without ifs took 1969 msec; answer was 500000000
Unpredictable ifs took 0 msec; answer was 500000000
此外,他張貼-O3優化這樣的結果:
Puzzling measurements:
Unpredictable ifs took 1890 msec; answer was 500000000
Unpredictable ifs took 2516 msec; answer was 500000000
Same without ifs took 1422 msec; answer was 500000000
Unpredictable ifs took 2516 msec; answer was 500000000
現在我有題。這裏發生了什麼?
更具體地說......一個固定功能如何採取如此不同的時間量?我的代碼有問題嗎?英特爾處理器有一些棘手的問題嗎?編譯器是否奇怪?這是因爲32位代碼在64位處理器上運行?
感謝您的關注!
編輯: 我接受g ++ -O1只是在其他兩個調用中重用返回的值。我也接受g ++ -O2和g ++ -O3有缺陷,將優化排除在外。測量速度的顯着差異(450%!!!)似乎仍然很神祕。
我看着由VS2010生成的代碼的反彙編。它做了3次內聯unpredictableIfs
。內聯代碼非常相似;循環是一樣的。它沒有內聯noIfs
。它確實滾動了noIfs
。一次迭代需要4個步驟。 noIfs
計算得像是寫過,而unpredictableIfs
使用jne
跳過增量。
你真的檢查過你的「if」和「noIfs」實際上是不同的,一旦編譯?現在許多編譯器都會發現,對於你正在做的事情,最好做一些數學運算而不是條件跳轉...... – 2013-02-19 19:20:47
@MatsPetersson與VS2010生成的代碼是完全不同的。我會編輯帖子來解釋。 – 2013-02-19 19:31:08
@ÖöTiib務必向我們展示您在使用VS2010進行編譯時使用的命令行開關。 – 2013-02-19 19:31:38