如何通過python中的數據框過濾列表？

如何通過python中的數據框過濾列表？如何通過python中的數據框過濾列表？

例如，我有表L = ['a', 'b', 'c']和數據幀df：

Name Value 
    a  0 
    a  1 
    b  2 
    d  3

結果應該是['a', 'b']。

來源

2017-09-03 Dmitry

a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist() 
print (a) 
['a', 'b']

或者：

a = np.intersect1d(L, df['Name']).tolist() 
print (a) 
['a', 'b']

計時：

df = pd.concat([df]*1000).reset_index(drop=True) 

L = ['a', 'b', 'c'] 

#jezrael 1 
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist()) 
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 774 µs per loop 

#jezrael 2  
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist()) 
1000 loops, best of 3: 1.81 ms per loop 

#divakar 
In [165]: %timeit ([i for i in L if i in df.Name.tolist()]) 
1000 loops, best of 3: 393 µs per loop 

#john galt 1 
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist()) 
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached. 
100 loops, best of 3: 2.36 ms per loop 

#john galt 2  
In [167]: %timeit ([x for x in df.Name.unique() if x in L]) 
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 182 µs per loop

來源

2017-09-03 08:32:10 jezrael

我想會很高興地看到'L'更多的元素也不僅僅是'3' :) – Divakar

肯定的另一種方式，我添加小數據，因爲OP說小df。給我一分鐘 – jezrael

@Dmitry - 你的數據的實際大小是多少？你L的大小是多少？ – jezrael

這裏有一個 -

[i for i in l if i in df.Name.tolist()]

採樣運行 -

In [303]: df 
Out[303]: 
    Name Value 
0 a  0 
1 a  1 
2 b  2 
3 d  3 

In [304]: l = ['a', 'b', 'c'] 

In [305]: [i for i in l if i in df.Name.tolist()] 
Out[305]: ['a', 'b']

來源

2017-09-03 08:32:34 Divakar

它比'jezrael's'解決方案更快嗎？ – Dmitry

@Dmitry爲自己測試？或者提供一個數據集來計時？ – Divakar

我沒有太多數據。只是好奇:)無論如何，感謝您的答案！ – Dmitry

使用query

In [1470]: df.query('Name in @L').Name.unique().tolist() 
Out[1470]: ['a', 'b']

或者，

In [1472]: [x for x in df.Name.unique() if x in L] 
Out[1472]: ['a', 'b']

來源

2017-09-03 08:35:27 Zero

如何通過python中的數據框過濾列表？

回答

相關問題