2017-09-03 94 views

回答

1
a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist() 
print (a) 
['a', 'b'] 

或者:

a = np.intersect1d(L, df['Name']).tolist() 
print (a) 
['a', 'b'] 

計時

df = pd.concat([df]*1000).reset_index(drop=True) 

L = ['a', 'b', 'c'] 

#jezrael 1 
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist()) 
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 774 µs per loop 

#jezrael 2  
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist()) 
1000 loops, best of 3: 1.81 ms per loop 

#divakar 
In [165]: %timeit ([i for i in L if i in df.Name.tolist()]) 
1000 loops, best of 3: 393 µs per loop 

#john galt 1 
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist()) 
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached. 
100 loops, best of 3: 2.36 ms per loop 

#john galt 2  
In [167]: %timeit ([x for x in df.Name.unique() if x in L]) 
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached. 
10000 loops, best of 3: 182 µs per loop 
+1

我想會很高興地看到'L'更多的元素也不僅僅是'3' :) – Divakar

+0

肯定的另一種方式,我添加小數據,因爲OP說小df。給我一分鐘 – jezrael

+0

@Dmitry - 你的數據的實際大小是多少?你L的大小是多少? – jezrael

1

這裏有一個 -

[i for i in l if i in df.Name.tolist()] 

採樣運行 -

In [303]: df 
Out[303]: 
    Name Value 
0 a  0 
1 a  1 
2 b  2 
3 d  3 

In [304]: l = ['a', 'b', 'c'] 

In [305]: [i for i in l if i in df.Name.tolist()] 
Out[305]: ['a', 'b'] 
+0

它比'jezrael's'解決方案更快嗎? – Dmitry

+0

@Dmitry爲自己測試?或者提供一個數據集來計時? – Divakar

+0

我沒有太多數據。只是好奇:)無論如何,感謝您的答案! – Dmitry

1

使用query

In [1470]: df.query('Name in @L').Name.unique().tolist() 
Out[1470]: ['a', 'b'] 

或者,

In [1472]: [x for x in df.Name.unique() if x in L] 
Out[1472]: ['a', 'b'] 
相關問題