0
當我在不同的GPU(Tesla K-20,cuda 7.5,6GB內存)中運行代碼時,出現以下錯誤(請參閱堆棧跟蹤) )。如果我運行在GeForce 1080或Titan X GPU上,代碼可以正常工作。PyTorch中出現OverflowError錯誤(34,'數字結果超出範圍')
堆棧跟蹤:
File "code/source/main.py", line 68, in <module>
train.train_epochs(train_batches, dev_batches, args.epochs)
File "/gpfs/home/g/e/geniiexe/BigRed2/code/source/train.py", line 34, in train_epochs
losses = self.train(train_batches, dev_batches, (epoch + 1))
File "/gpfs/home/g/e/geniiexe/BigRed2/code/source/train.py", line 76, in train
self.optimizer.step()
File "/gpfs/home/g/e/geniiexe/BigRed2/anaconda3/lib/python3.5/site-packages/torch/optim/adam.py", line 70, in step
bias_correction1 = 1 - beta1 ** state['step']
OverflowError: (34, 'Numerical result out of range')
所以,你可以在不同的GPU(特斯拉K-20),同時它工作正常的GeForce或泰坦X GPU得到這樣錯誤的原因是什麼?此外,錯誤的含義是什麼?它與內存溢出有關,我不這麼認爲。