pandas/numpy np.where（df ['x']。str.contains（'y'））vs np.where（'y'in df ['x']）

-1

作爲python和pandas的新手，我想：pandas/numpy np.where（df ['x']。str.contains（'y'））vs np.where（'y'in df ['x']）

df_rows = np.where('y' in df['x'])[0] 
for i in df_rows: 
    print df_rows.iloc[i]

返回行，但

df_rows = np.where(df['x'].str.contains('y'))[0] 
for i in df_rows: 
    print df_rows.iloc[i]

做工作，並返回一個包含在df['x']'y'行。

我錯過了什麼？爲什麼第一種形式失敗？（Python 2.7版）

來源

2017-06-05 abe

[ 'X'] '是一個列表式的序列，你正在尋找正好是''''的條目。在第二個中，'df ['x']。str'將類似字符串的操作向'df ['x']' –

這些是不同的操作：

in的檢查搜索如果任何元素等於'y'。（注意：Series的str這可能無法正常工作）。
.str.contains方法搜索每個元素的字符串表示形式，如果它包含'y'。

第一個只能返回True或False（這是因爲Pythons data model says so並執行它）。第二種是普通方法，並返回Series，其中包含True或False（因爲普通方法可以做他們喜歡的）。

>>> import pandas as pd 
>>> s = pd.Series(['abc', 'def', 'ghi']) 
>>> s.str.contains('a') 
0  True 
1 False 
2 False 
dtype: bool 
>>> s.eq('a') # looking for an identical match 
0 False 
1 False 
2 False 
dtype: bool

來源

2017-06-05 16:26:52 MSeifert

熊貓需要特定的語法才能工作。使用運算符in查找stry檢查字符串y在熊貓Series中的成員身份。

>>> df = pd.DataFrame({'x': ['hiya', 'howdy', 'hello']}) 
>>> df 
     x 
0 hiya 
1 howdy 
2 hello 
>>> df_rows = np.where('y' in df['x'])[0] 
>>> df_rows 
array([], dtype=int64) 
>>> df_rows = np.where(df['x'].str.contains('y'))[0] 
>>> df_rows 
array([0, 1], dtype=int64)

試試這個，注意它返回一個布爾值，而不是三個（像我們可能會首先想到，因爲有該系列三個項目）：

>>> 'y' in df['x'] 
False 
>>> 'hiya' in df['x'] 
False 
>>> 'hiya' in df['x'].values 
True

你總是需要考慮到自己：「我正在尋找系列中的物品，還是我正在尋找系列內物品中的字符串？「

對於在一系列項目，使用isin：

df['x'].isin(['hello'])

對於字符串的項目內，使用.str.{whatever}（或.apply(lambda s: s)）：在所述第一，`DF

>>> df['x'].str.contains('y') 
0  True 
1  True 
2 False 
Name: x, dtype: bool 
>>> df['x'].apply(lambda s: 'y' in s) 
0  True 
1  True 
2 False 
Name: x, dtype: bool

來源

2017-06-05 16:50:34 Jarad

pandas/numpy np.where（df ['x']。str.contains（'y'））vs np.where（'y'in df ['x']）

回答

相關問題