使用df.column.str.contains並更新熊貓數據幀列

我有一個有兩列的熊貓數據框。使用df.column.str.contains並更新熊貓數據幀列

df= pd.DataFrame({"C": ['this is orange','this is apple','this is pear','this is plum','this is orange'], "D": [0,0,0,0,0]})

我希望能夠讀取此C列並返回D列中水果的名稱。所以我的思考過程是使用df.C.str.contains來確定是否某個字符串出現在C的每一行，然後D相應更新.C中的元素可能是真正的長字符串：例如。「這是紅色的蘋果」，但我只在乎蘋果這個詞出現在細胞中。我應該注意到，我並不想使用str.contains，但這似乎是最明顯的道路。只是不知道我會如何應用它。

最後的數據幀的樣子：

df= pd.DataFrame({"C": ['this is orange','this is apple','this is pear','this is plum','this is orange'], "D": ['orange','apple','pear','plum','grapefruit']})

來源

2017-06-16 John

考慮這個數據幀

df= pd.DataFrame({"C": ['this is orange','this is apple which is red','this is pear','this is plum','this is orange'], "D": [0,0,0,0,0]}) 

    C       D 
0 this is orange    0 
1 this is apple which is red 0 
2 this is pear    0 
3 this is plum    0 
4 this is orange    0

您可以使用下面的代碼來提取水果名稱假設水果的名稱跟「這是」

df['D'] = df.C.str.extract('this is ([A-Za-z]+)\s?.*?')

你得到

C       D 
0 this is orange    orange 
1 this is apple which is red apple 
2 this is pear    pear 
3 this is plum    plum 
4 this is orange    orange

對於您發佈的示例數據集，在空間上簡單拆分並提取la第一單元的工作原理

df['D'] = df.C.str.split(' ').str[-1]

來源

2017-06-16 16:49:54 Vaishali

如果這完全改變了問題我理解並會重新要求，但如果果被括號和無空格包裹什麼？相反，它可能是這樣的（橙色）。我只想返回橙色這個詞。 – John

您可以使用df.C.str.extract（'this is \（？（[A-Za-z] +）\ s？。*？'）來處理水果周圍的括號的可能性。案例 – Vaishali

並感謝您接受:) – Vaishali

既然你沒有指定是如何被提取的水果，我假設它總是以「這是」開頭;因此，下面應該很長的路要走：

import pandas as pd 

d = {'C': ['this is orange', 
    'this is apple', 
    'this is pear', 
    'this is plum', 
    'this is orange'], 
'D': [0, 0, 0, 0, 0]} 

dff = pd.DataFrame(d) 

dff['D'] = dff.C.str.replace(r'(this is) ([A-Za-z]+)','\\2') 
# or just 
dff.C.str.replace('this is ','') 


#     C  D 
# 0 this is orange orange 
# 1 this is apple apple 
# 2 this is pear pear 
# 3 this is plum plum 
# 4 this is orange orange

這使用.str.replace更換「這是」一個空字符串。

我希望這會有所幫助。

來源

2017-06-16 16:42:07 Abdou

如果句子總是this is開始，其次是fruit name也就是說，如果第三個詞是始終fruit name那麼，你也可以使用apply與split()功能，例如沿對數據幀中的每一行string被拆分，結果第三個是採取了D替換列值：

df['D'] = df['C'].apply(lambda val: val.split()[2])

或在對方的回答只是split功能說明，

df['D'] = df['C'].str.split().str[2]

輸出：

C D 0 this is orange orange 1 this is apple apple 2 this is pear pear 3 this is plum plum 4 this is orange orange

來源

2017-06-16 16:53:13 0p3n5ourcE

使用df.column.str.contains並更新熊貓數據幀列

回答

相關問題