有沒有辦法增加不太慢的numpy數組？

我有一個腳本，累積（計數）包含在兩個文件中的字節。字節類似於C類unsigned char 0到255之間的整數值。有沒有辦法增加不太慢的numpy數組？

該累加器腳本的目標是計算這兩個文件中字節的聯合計數或頻率。可能將其擴展到多個文件/維度。

這兩個文件的大小相同，但它們非常大，約爲6TB左右。

我正在使用numpy.uint64值，因爲我使用Python的int類型出現溢出問題。

我有一個長度爲255**2的1D累加器陣列，用於存儲關節計數。

我計算從逐列到陣列偏移計算的偏移量，以便在正確的索引處增加聯合頻率。我以字節塊（n_bytes）遍歷這兩個文件，將它們解壓縮並增加頻率計數器。

下面的代碼的草圖：

import numpy 
import ctypes 
import struct 

buckets_per_signal_type = 2**(ctypes.c_ubyte(1).value * 8) 
total_buckets = buckets_per_signal_type**2 
buckets = numpy.zeros((total_buckets,), dtype=numpy.uint64) 

# open file handles to two files (omitted for brevity...) 

# buffer size that is known ahead of time to be a divisible 
# unit of the original files 
# (for example, here, reading in 2.4e6 bytes per loop iteration) 
n_bytes = 2400000 

total_bytes = 0L 

# format used to unpack bytes 
struct_format = "=%dB" % (n_bytes) 

while True:  
    # read in n_bytes chunk from each file 
    first_file_bytes = first_file_handle.read(n_bytes) 
    second_file_bytes = second_file_handle.read(n_bytes) 

    # break if both file handles have nothing left to read 
    if len(first_file_bytes) == 0 and len(second_file_bytes) == 0: 
     break 

    # unpack actual bytes 
    first_bytes_unpacked = struct.unpack(struct_format, first_file_bytes) 
    second_bytes_unpacked = struct.unpack(struct_format, second_file_bytes) 

    for index in range(0, n_bytes): 
     first_byte = first_bytes_unpacked[index] 
     second_byte = second_bytes_unpacked[index] 
     offset = first_byte * buckets_per_signal_type + second_byte 
     buckets[offset] += 1 

    total_bytes += n_bytes 
    # repeat until both file handles are both EOF 

# print out joint frequency (omitted)

與我曾經int的版本相比，這是令人難以置信的速度慢，一個數量級上的速度較慢。原來的作業在大約8個小時內完成（錯誤地，由於溢出），並且這個基於numpy的版本必須儘早退出，因爲它似乎需要大約12-14天才能完成。

在這個基本任務中numpy的速度非常慢，或者我沒有像Python那樣用numpy做累加器。我懷疑後者，這就是我爲什麼要求幫助的原因。

我讀了numpy.add.at，但我將添加到buckets數組中的解壓字節數組沒有偏移值，這些值自然地轉換爲buckets數組的「形狀」。

是否有一種方法來存儲和增加一個（長）整數數組，不溢出，哪個是合理的高性能？

我可以在C中重寫這個，我猜，但我希望有一些numpy的東西，我忽略了這個問題很快就能解決。謝謝你的建議。

更新

我有舊版本numpy的和SciPy的那不支持numpy.add.at。所以這是另一個需要研究的問題。

我會嘗試以下，看看如何繼續下去：

first_byte_arr = np.array(first_bytes_unpacked)                     
second_byte_arr = np.array(second_bytes_unpacked)                     
offsets = first_byte_arr * buckets_per_signal_type + second_byte_arr                
np.add.at(buckets, offsets, 1L)

希望它運行快一點！

更新II

使用np.add.at和np.array，這個工作將需要大約12天的完成。我現在要放棄numpy，並回到用C讀取原始字節，其中運行時間更合理。謝謝你的建議！

來源

2017-09-04 Alex Reynolds

「我閱讀了關於'numpy.add.at'，但我將添加到'buckets'數組的解壓字節數組沒有偏移值自然而然地轉化爲「水桶」陣列的「形狀」。「 - 你能強制使用'np.digitize'嗎？不幸的是，我不能跟着你正在用'struct'完成的事情，以便給出關於'numpy'的良好答案。' –

'struct'在這裏用於從每個字節快速生成0到255之間的整數數組。我從映射到字節或整數A的第一個文件中獲得一個字節，並從映射到字節/整數B的第二個文件獲取一個字節。我計算了看到A和B在一起的次數。我似乎沒有從'numpy.digitize'文檔中獲得足夠的信息來查看它是如何應用於此處的，但我會多查看一下，看看它是否有幫助。感謝指針。 –

那麼這個和'int'版本的唯一區別是''bucket''的'dtype'？只有在一次調用中爲重複元素編制索引時，add.at'纔有用。在你的代碼中'offset'只是一個標量，對吧？ – hpaulj

不要試圖按照所有的文件讀取和struct代碼，它看起來像你1添加到buckets陣列的各種插槽。這部分不應該花費幾天時間。

但要想知道的dtype如何影響該步驟，我會測試將隨機分類的索引加1。

In [57]: idx = np.random.randint(0,255**2,10000) 
In [58]: %%timeit buckets = np.zeros(255**2, dtype=np.int64) 
    ...: for i in idx: 
    ...: buckets[i] += 1 
    ...: 
9.38 ms ± 39.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 
In [59]: %%timeit buckets = np.zeros(255**2, dtype=np.uint64) 
    ...: for i in idx: 
    ...: buckets[i] += 1 
    ...: 
71.7 ms ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

uint64約慢了8倍。

如果沒有重複，我們可以做buckets[idx] += 1。但允許重複，我們必須使用add.at：

In [60]: %%timeit buckets = np.zeros(255**2, dtype=np.int64) 
    ...: np.add.at(buckets, idx, 1) 
    ...: 
1.6 ms ± 348 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 
In [61]: %%timeit buckets = np.zeros(255**2, dtype=np.uint64) 
    ...: np.add.at(buckets, idx, 1) 
    ...: 
1.62 ms ± 15.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

有趣的是D型細胞uint64不影響在這種情況下的時序。

你在評論中提到你嘗試了一個列表累加器。我想這樣：

In [62]: %%timeit buckets = [0]*(255**2) 
    ...: for i in idx: 
    ...: buckets[i] += 1 
    ...: 
3.59 ms ± 44.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

這比數組的迭代版本更快。一般來說，數組上的迭代比列表上的迭代要慢。這是更快速的「全陣列」操作，例如add.at。

要驗證add.at是迭代正確的替代品，比較

In [63]: buckets0 = np.zeros(255**2, dtype=np.int64) 
In [64]: for i in idx: buckets0[i] += 1 

In [66]: buckets01 = np.zeros(255**2, dtype=np.int64) 
In [67]: np.add.at(buckets01, idx, 1) 
In [68]: np.allclose(buckets0, buckets01) 
Out[68]: True 

In [69]: buckets02 = np.zeros(255**2, dtype=np.int64) 
In [70]: buckets02[idx] += 1 
In [71]: np.allclose(buckets0, buckets02) 
Out[71]: False 

In [75]: bucketslist = [0]*(255**2) 
In [76]: for i in idx: bucketslist[i] += 1 
In [77]: np.allclose(buckets0, bucketslist) 
Out[77]: True

來源

2017-09-04 06:38:56 hpaulj

numpy都有自己的I文件fromfile/O方法，你可能會更好，如果使用你想要在numpy陣列中輸出。（見this question）

可能更好地使用由numpy給出的array結構，使您buckets一個二維數組：

buckets_per_signal_type = 2**(ctypes.c_ubyte(1).value * 8) 
buckets = numpy.zeros((buckets_per_signal_type, buckets_per_signal_type), dtype=numpy.uint64)

，然後只用np.add.at遞增箱

# define record_type to match your data 
while True 
    data_1 = np.fromfile(first_file_handle, dtype=record_dtype, count=nbytes) 
    data_2 = np.fromfile(second_file_handle, dtype=record_dtype, count=nbytes) 
    s = np.minimum(data_1.size, data_2.size) 
    if s == 0: 
     break 
    np.add.at(buckets, [data_1[:s], data_2[:s]], 1)

來源

2017-09-04 08:36:08

有沒有辦法增加不太慢的numpy數組？

回答

相關問題