2017-06-16 105 views
2

我有一個有兩列的熊貓數據框。使用df.column.str.contains並更新熊貓數據幀列

df= pd.DataFrame({"C": ['this is orange','this is apple','this is pear','this is plum','this is orange'], "D": [0,0,0,0,0]}) 

我希望能夠讀取此C列並返回D列中水果的名稱。所以我的思考過程是使用df.C.str.contains來確定是否某個字符串出現在C的每一行,然後D相應更新.C中的元素可能是真正的長字符串:例如。 「這是紅色的蘋果」,但我只在乎蘋果這個詞出現在細胞中。我應該注意到,我並不想使用str.contains,但這似乎是最明顯的道路。只是不知道我會如何應用它。

最後的數據幀的樣子:

df= pd.DataFrame({"C": ['this is orange','this is apple','this is pear','this is plum','this is orange'], "D": ['orange','apple','pear','plum','grapefruit']}) 

回答

1

考慮這個數據幀

df= pd.DataFrame({"C": ['this is orange','this is apple which is red','this is pear','this is plum','this is orange'], "D": [0,0,0,0,0]}) 

    C       D 
0 this is orange    0 
1 this is apple which is red 0 
2 this is pear    0 
3 this is plum    0 
4 this is orange    0 

您可以使用下面的代碼來提取水果名稱假設水果的名稱跟「這是」

df['D'] = df.C.str.extract('this is ([A-Za-z]+)\s?.*?') 

你得到

C       D 
0 this is orange    orange 
1 this is apple which is red apple 
2 this is pear    pear 
3 this is plum    plum 
4 this is orange    orange 

對於您發佈的示例數據集,在空間上簡單拆分並提取la第一單元的工作原理

df['D'] = df.C.str.split(' ').str[-1] 
+0

如果這完全改變了問題我理解並會重新要求,但如果果被括號和無空格包裹什麼?相反,它可能是這樣的(橙色)。我只想返回橙色這個詞。 – John

+0

您可以使用df.C.str.extract('this is \(?([A-Za-z] +)\ s?。*?')來處理水果周圍的括號的可能性。案例 – Vaishali

+0

並感謝您接受:) – Vaishali

1

既然你沒有指定是如何被提取的水果,我假設它總是以「這是」開頭;因此,下面應該很長的路要走:

import pandas as pd 

d = {'C': ['this is orange', 
    'this is apple', 
    'this is pear', 
    'this is plum', 
    'this is orange'], 
'D': [0, 0, 0, 0, 0]} 

dff = pd.DataFrame(d) 

dff['D'] = dff.C.str.replace(r'(this is) ([A-Za-z]+)','\\2') 
# or just 
dff.C.str.replace('this is ','') 


#     C  D 
# 0 this is orange orange 
# 1 this is apple apple 
# 2 this is pear pear 
# 3 this is plum plum 
# 4 this is orange orange 

這使用.str.replace更換「這是」一個空字符串。

我希望這會有所幫助。

1

如果句子總是this is開始,其次是fruit name也就是說,如果第三個詞是始終fruit name那麼,你也可以使用applysplit()功能,例如沿對數據幀中的每一行string被拆分,結果第三個是採取了D替換列值:

df['D'] = df['C'].apply(lambda val: val.split()[2]) 

或在對方的回答只是split功能說明,

df['D'] = df['C'].str.split().str[2]

輸出:

C D 0 this is orange orange 1 this is apple apple 2 this is pear pear 3 this is plum plum 4 this is orange orange