考慮下面的代碼:PyTables DTYPE對齊問題
import os
import numpy as np
import tables as tb
# Pass the field-names and their respective datatypes as
# a description to the table
dt = np.dtype([('doc_id', 'u4'), ('word', 'u4'),
('tfidf', 'f4')], align=True)
# Open a h5 file and create a table
f = tb.openFile('corpus.h5', 'w')
t = f.createTable(f.root, 'table', dt, 'train set',
filters=tb.Filters(5, 'blosc'))
r = t.row
for i in xrange(20):
r['doc_id'] = i
r['word'] = np.random.randint(1000000)
r['tfidf'] = rand()
r.append()
t.flush()
# structured array from table
sa = t[:]
f.close()
os.remove('corpus.h5')
我在對齊dtype
傳遞的對象,但是當我看到sa
,我得到如下:
print dt
print "aligned?", dt.isalignedstruct
print
print sa.dtype
print "aligned?", sa.dtype.isalignedstruct
>>>
{'names':['doc_id','word','tfidf'], 'formats':['<u4','<u4','<f4'], 'offsets':[0,4,8], 'itemsize':12, 'aligned':True}
aligned? True
[('doc_id', '<u4'), ('word', '<u4'), ('tfidf', '<f4')]
aligned? False
結構化陣列不對齊。目前沒有辦法在PyTables中強制對齊,或者我做錯了什麼?
編輯:我注意到我的問題是類似this one,但我已經複製並試圖提供的答案,但它也不管用。
EDIT2:(見下喬爾·弗魯姆的答案)
我複製喬爾的回答和測試,看它是否是通過用Cython真正的解壓。原來它是:
In [1]: %load_ext cythonmagic
In [2]: %%cython -f -c=-O3
...: import numpy as np
...: cimport numpy as np
...: import tables as tb
...: f = tb.openFile("corpus.h5", "r")
...: t = f.root.table
...: cdef struct Word: # notice how this is not packed
...: np.uint32_t doc_id, word
...: np.float32_t tfidf
...: def main(): # <-- np arrays in Cython have to be locally declared, so put array in a function
...: cdef np.ndarray[Word] sa = t[:3]
...: print sa
...: print "aligned?", sa.dtype.isalignedstruct
...: main()
...: f.close()
...:
[(0L, 232880L, 0.2658001184463501) (1L, 605285L, 0.9921777248382568) (2L, 86609L, 0.5266860723495483)]
aligned? False
選項#1中所給出的是需要你的數據被加載到存儲器以執行對齊並不總是可行的。 –
我已經更新了我的問題以表明它確實在技術上對齊 – richizy