2016-01-06 73 views
3

我試圖在numpy中創建高效的廣播數組,例如一組只有1000個元素的shape=[1000,1000,1000]數組,但重複1e6次。這可以通過np.lib.stride_tricks.as_stridednp.broadcast_arrays來實現。內存中numpy跨步數組/廣播數組的大小?

但是,我無法驗證內存中是否有重複,這很關鍵,因爲實際上在內存中複製陣列的測試往往會導致我的機器無法回溯。

我已經試過檢查使用.nbytes數組的大小,但是這似乎並不符合實際使用的內存:

>>> import numpy as np 
>>> import resource 
>>> initial_memuse = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
>>> pagesize = resource.getpagesize() 
>>> 
>>> x = np.arange(1000) 
>>> memuse_x = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
>>> print("Size of x = {0} MB".format(x.nbytes/1e6)) 
Size of x = 0.008 MB 
>>> print("Memory used = {0} MB".format((memuse_x-initial_memuse)*resource.getpagesize()/1e6)) 
Memory used = 150.994944 MB 
>>> 
>>> y = np.lib.stride_tricks.as_strided(x, [1000,10,10], strides=x.strides + (0, 0)) 
>>> memuse_y = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
>>> print("Size of y = {0} MB".format(y.nbytes/1e6)) 
Size of y = 0.8 MB 
>>> print("Memory used = {0} MB".format((memuse_y-memuse_x)*resource.getpagesize()/1e6)) 
Memory used = 201.326592 MB 
>>> 
>>> z = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0)) 
>>> memuse_z = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss 
>>> print("Size of z = {0} MB".format(z.nbytes/1e6)) 
Size of z = 80.0 MB 
>>> print("Memory used = {0} MB".format((memuse_z-memuse_y)*resource.getpagesize()/1e6)) 
Memory used = 0.0 MB 

所以陣列的.nbytes報告的「理論」的大小,但顯然不是實際的大小。 resource檢查有點尷尬,因爲看起來好像有些東西被緩存(可能?),導致第一次跨越佔用一些內存,但未來的跨步不需要。

tl; dr:你如何確定內存中numpy數組或視圖的實際大小?

+0

只要找到了Python的駐留集大小進程不是確定特定numpy陣列使用多少內存的可靠方法。它不會將分頁考慮進去,並且沒有簡單的方法可以預測分配給已刪除或超出範圍的陣列的內存何時會被釋放回操作系統。 –

回答

4

一種方法是檢查數組的.base attribute,該數組引用數組「借用」其內存的對象。例如:

x = np.arange(1000) 
print(x.flags.owndata)  # x "owns" its data 
# True 
print(x.base is None)  # its base is therefore 'None' 
# True 

a = x.reshape(100, 10)  # a is a reshaped view onto x 
print(a.flags.owndata)  # it therefore "borrows" its data 
# False 
print(a.base is x)   # its .base is x 
# True 

事情稍微複雜與np.lib.stride_tricks

b = np.lib.stride_tricks.as_strided(x, [1000,100,100], strides=x.strides + (0, 0)) 

print(b.flags.owndata) 
# False 
print(b.base) 
# <numpy.lib.stride_tricks.DummyArray object at 0x7fb40c02b0f0> 

這裏,b.basenumpy.lib.stride_tricks.DummyArray情況下,它看起來像這樣:

class DummyArray(object): 
    """Dummy object that just exists to hang __array_interface__ dictionaries 
    and possibly keep alive a reference to a base array. 
    """ 

    def __init__(self, interface, base=None): 
     self.__array_interface__ = interface 
     self.base = base 

因此,我們可以檢查b.base.base

print(b.base.base is x) 
# True 

擁有基本陣列後,其屬性.nbytes應準確反映其佔用的內存量。

原則上可以有一個數組視圖或從另一個跨步數組創建一個跨步數組。假設您的視圖或跨接數組最終由另一個numpy數組支持,您可以遞歸引用其.base屬性。一旦你找到一個對象,其.baseNone,你已經找到了潛在的對象從你的陣列是借用它的內存:

def find_base_nbytes(obj): 
    if obj.base is not None: 
     return find_base_nbytes(obj.base) 
    return obj.nbytes 

正如預期的那樣,

print(find_base_nbytes(x)) 
# 8000 

print(find_base_nbytes(y)) 
# 8000 

print(find_base_nbytes(z)) 
# 8000