2016-01-13 179 views
2

當我訓練網時,咖啡崩潰。第一次迭代後咖啡崩潰?

solvermodel

在這種情況下我只用GPU 0以下是錯誤跟蹤:

build/tools/caffe train -solver models/mv16f/solver.prototxt -gpu 0 

I0113 14:21:05.861361 85242 solver.cpp:289] Learning Rate Policy: step 
I0113 14:21:05.862876 85242 solver.cpp:341] Iteration 0, Testing net (#0) 
I0113 14:21:30.271030 85242 solver.cpp:409]  Test net output #0: accuracy = 0.00872 
I0113 14:21:30.271070 85242 solver.cpp:409]  Test net output #1: loss = 4.62895 (* 1 = 4.62895 loss) 
I0113 14:21:32.317018 85242 solver.cpp:237] Iteration 0, loss = 4.62663 
I0113 14:21:32.317062 85242 solver.cpp:253]  Train net output #0: loss = 4.62663 (* 1 = 4.62663 loss) 
*** Aborted at 1452691298 (unix time) try "date -d @1452691298" if you are using GNU date *** 
PC: @  0x7fe7f65f1cbc caffe::SGDSolver<>::GetLearningRate() 
*** SIGFPE (@0x7fe7f65f1cbc) received by PID 85242 (TID 0x7fe7f72057c0) from PID 18446744073548012732; stack trace: *** 
    @  0x7fe7f49c0d40 (unknown) 
    @  0x7fe7f65f1cbc caffe::SGDSolver<>::GetLearningRate() 
    @  0x7fe7f65f2281 caffe::SGDSolver<>::ApplyUpdate() 
    @  0x7fe7f65d967c caffe::Solver<>::Step() 
    @  0x7fe7f65d8990 caffe::Solver<>::Solve() 
    @  0x7fe7f673251e caffe::P2PSync<>::run() 
    @   0x416aa6 train() 
    @   0x418c9a main 
    @  0x7fe7f49abec5 (unknown) 
    @   0x415819 (unknown) 
@    0x0 (unknown) 

火車的全部輸出是here

回答

2

你的求解文件中有這一行

lr_policy: "fixed" 

但來自Caffe輸出有這條線

lr_policy: "step" 

如果使用逐步學習,你必須定義stepsize。同樣,解算器文件表明您已經定義了步長,但Caffe輸出不顯示步長。請再次檢查您的解算器文件,並將此行添加回

stepsize: 10000 
+0

謝謝現在可以使用。我應該抓住我的錯誤。 –