如何通過python中的數據框過濾列表?如何通過python中的數據框過濾列表?
例如,我有表L = ['a', 'b', 'c']
和數據幀df
:
Name Value
a 0
a 1
b 2
d 3
結果應該是['a', 'b']
。
如何通過python中的數據框過濾列表?如何通過python中的數據框過濾列表?
例如,我有表L = ['a', 'b', 'c']
和數據幀df
:
Name Value
a 0
a 1
b 2
d 3
結果應該是['a', 'b']
。
a = df.loc[df['Name'].isin(L), 'Name'].unique().tolist()
print (a)
['a', 'b']
或者:
a = np.intersect1d(L, df['Name']).tolist()
print (a)
['a', 'b']
計時:
df = pd.concat([df]*1000).reset_index(drop=True)
L = ['a', 'b', 'c']
#jezrael 1
In [163]: %timeit (df.loc[df['Name'].isin(L), 'Name'].unique().tolist())
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop
#jezrael 2
In [164]: %timeit (np.intersect1d(L, df['Name']).tolist())
1000 loops, best of 3: 1.81 ms per loop
#divakar
In [165]: %timeit ([i for i in L if i in df.Name.tolist()])
1000 loops, best of 3: 393 µs per loop
#john galt 1
In [166]: %timeit (df.query('Name in @L').Name.unique().tolist())
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.36 ms per loop
#john galt 2
In [167]: %timeit ([x for x in df.Name.unique() if x in L])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 182 µs per loop
這裏有一個 -
[i for i in l if i in df.Name.tolist()]
採樣運行 -
In [303]: df
Out[303]:
Name Value
0 a 0
1 a 1
2 b 2
3 d 3
In [304]: l = ['a', 'b', 'c']
In [305]: [i for i in l if i in df.Name.tolist()]
Out[305]: ['a', 'b']
使用query
In [1470]: df.query('Name in @L').Name.unique().tolist()
Out[1470]: ['a', 'b']
或者,
In [1472]: [x for x in df.Name.unique() if x in L]
Out[1472]: ['a', 'b']
我想會很高興地看到'L'更多的元素也不僅僅是'3' :) – Divakar
肯定的另一種方式,我添加小數據,因爲OP說小df。給我一分鐘 – jezrael
@Dmitry - 你的數據的實際大小是多少?你L的大小是多少? – jezrael