2015-03-02 76 views
0

可以優化/矢量化下面的代碼嗎?現在看來,這似乎不是一種正確的做事方式,也不是非常「蟒蛇」。該代碼旨在處理大量數據,因此性能非常重要。Python numpy列表過濾

這個想法是刪除這兩個列表中不存在的所有值及其名稱。

E.g.以下代碼的結果將是兩個名稱分別爲「name2」和「name4」的值分別爲[2,4和5,6]的結果。

import numpy as np 

names1=np.array(["name1","name2","name3","name4"]) 
names2=np.array(["name2","name4","name5","name6"]) 

pos1=np.array([1,2,3,4]) 
pos2=np.array([5,6,7,8]) 


for entry in names2: 
    if not np.any(names1==entry): 
     pointer=np.where(names2==entry) 
     pos2=np.delete(pos2,pointer) 
     names2=np.delete(names2,pointer) 

for entry in names1: 
    if not np.any(names2==entry): 
     pointer=np.where(names1==entry) 

     pos1=np.delete(pos1,pointer) 
     names1=np.delete(names1,pointer) 
+1

你執着於使用'numpy'這個?這感覺更像是一個「熊貓」問題。 – DSM 2015-03-02 18:10:13

+0

我沒有熊貓的經驗。任何提示讚賞 – 2015-03-02 18:17:24

回答

0

這裏是矢量答案:

import numpy as np 

names1=np.array(["name1","name2","name3","name4"]) 
names2=np.array(["name2","name4","name5","name6"]) 

pos1=np.array([1,2,3,4]) 
pos2=np.array([5,6,7,8]) 

intersection=np.intersect1d(names1,names2) 
pointer1=np.argwhere(np.in1d(names1, intersection) == False) 
pointer2=np.argwhere(np.in1d(names2, intersection) == False) 

pos2=np.delete(pos2,pointer2) 
names2=np.delete(names2,pointer2) 

pos1=np.delete(pos1,pointer1) 
names1=np.delete(names1,pointer1) 
0

FWIW,這是pandas一個簡單merge操作:

>>> df1 = pd.DataFrame({"name": names1, "pos": pos1}) 
>>> df2 = pd.DataFrame({"name": names2, "pos": pos2}) 
>>> df1 
    name pos 
0 name1 1 
1 name2 2 
2 name3 3 
3 name4 4 
>>> df2 
    name pos 
0 name2 5 
1 name4 6 
2 name5 7 
3 name6 8 
>>> df1.merge(df2, on="name", suffixes=[1,2]) 
    name pos1 pos2 
0 name2  2  5 
1 name4  4  6