巨大數組的點積numpy

我有一個巨大的數組，我想用小數組計算點積。但我越來越'陣列太大'有沒有解決辦法？巨大數組的點積numpy

import numpy as np 

eMatrix = np.random.random_integers(low=0,high=100,size=(20000000,50)) 
pMatrix = np.random.random_integers(low=0,high=10,size=(50,50)) 

a = np.dot(eMatrix,pMatrix) 

Error: 
/Library/Python/2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.random_integers (numpy/random/mtrand/mtrand.c:9385)() 

/Library/Python/2.7/site-packages/numpy/random/mtrand.so in mtrand.RandomState.randint (numpy/random/mtrand/mtrand.c:7051)() 

ValueError: array is too big.

來源

2014-09-05 Lanc

這種情況已經發生在eMatrix =，no？您要求10^9個整數 - 每個整數字節數的1倍。所以至少應該將它們放入dtype int8而不是默認的int64數組中。 – mdurant 2014-09-05 14:33:04

但是我有一臺64位的機器，內存爲16GB RAM – Lanc 2014-09-05 14:51:56

所以8GB的第一個ePrime，至少也是一樣的，也許還有一些看不見的中間產品。 – mdurant 2014-09-05 14:53:59

我認爲唯一的「簡單」答案是獲得更多的RAM。

它花了15GB，但我能夠在我的MacBook上做到這一點。

In [1]: import numpy 
In [2]: e = numpy.random.random_integers(low=0, high=100, size=(20000000, 50)) 
In [3]: p = numpy.random.random_integers(low=0, high=10, size=(50, 50)) 
In [4]: a = numpy.dot(e, p) 
In [5]: a[0] 
Out[5]: 
array([14753, 12720, 15324, 13588, 16667, 16055, 14144, 15239, 15166, 
     14293, 16786, 12358, 14880, 13846, 11950, 13836, 13393, 14679, 
     15292, 15472, 15734, 12095, 14264, 12242, 12684, 11596, 15987, 
     15275, 13572, 14534, 16472, 14818, 13374, 14115, 13171, 11927, 
     14226, 13312, 16070, 13524, 16591, 16533, 15466, 15440, 15595, 
     13164, 14278, 13692, 12415, 13314])

一種可能的解決方案可能是使用sparse matrix和稀疏矩陣點運算符。

例如，在我的機器上只用e作爲一個密度矩陣使用8GB的RAM。構建一個類似的稀疏矩陣eprime：

In [1]: from scipy.sparse import rand 
In [2]: eprime = rand(20000000, 50)

具有在內存方面可忽略的成本。

來源

2014-09-05 14:51:27 stderr

我相信，一旦你做了一個像點一樣的計算，你將再次擁有一個密集的矩陣。 – mdurant 2014-09-05 14:52:46

嘿@stderr正如我上面提到的我也試圖在Mac上有16GB內存，但它是失敗的。 – Lanc 2014-09-05 14:54:45

另外我不想稀疏矩陣，我的矩陣需要密集 – Lanc 2014-09-05 14:55:28

我相信答案是你沒有足夠的內存，也可能你正在運行一個32位版本的python。也許澄清你正在運行的操作系統。許多操作系統將運行32位和64位程序。

來源

2014-09-05 15:13:24 beiller

如何檢查我是否運行32位版本的Python？ – Lanc 2014-09-05 15:34:14

如上所述，在這裏看到如何確定您是否運行64位或32位的python可執行文件：http://stackoverflow.com/questions/1405913/how-do-i-determine-if-my-python-shell-正在執行32位或64位模式的操作系統 – beiller 2014-09-05 17:44:52

如果確定數組的總大小（如果它溢出本機int類型see here以確定源代碼行），則會引發該錯誤。

爲了實現這一點，無論您的機器是64位，您幾乎肯定會運行32位版本的Python（和NumPy）。 You can check if that is the case by doing：

>>> import sys 
>>> sys.maxsize 
2147483647 # <--- 2**31 - 1, on a 64 bit version you would get 2**63 - 1

話又說回來，你數組是「唯一」 20000000 * 50 = 1000000000，這是剛下2**30。如果我嘗試重現上32位numpy的搜索結果，我得到一個MemoryError：

>>> np.random.random_integers(low=0,high=100,size=(20000000,50)) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "mtrand.pyx", line 1420, in mtrand.RandomState.random_integers (numpy\random\mtrand\mtrand.c:12943) 
    File "mtrand.pyx", line 938, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10338) 
MemoryError

，除非我增加大小超出了魔術2**31 - 1門檻

>>> np.random.random_integers(low=0,high=100,size=(2**30, 2)) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "mtrand.pyx", line 1420, in mtrand.RandomState.random_integers (numpy\random\mtrand\mtrand.c:12943) 
    File "mtrand.pyx", line 938, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:10338) 
ValueError: array is too big.

鑑於該行號的區別在你的回溯和我的，我懷疑你正在使用一箇舊版本。這個輸出在你的系統上有什麼作用：

>>> np.__version__ 
'1.10.0.dev-9c50f98'

來源

2014-09-05 16:19:39 Jaime

感謝您的洞察！我正在使用numpy 1.8.2版本 – Lanc 2014-09-07 10:31:40

巨大數組的點積numpy

回答

相關問題