非匹配詞在python中刪除

我有一個基於文本的字符串，並且只想保留特定的單詞。非匹配詞在python中刪除

sample = "This is a test text. Test text should pass the test" 
approved_list = ["test", "text"]

預期輸出：

"test text Test text test"

我已經經歷了很多的regex基於閱讀的答案，可惜的是他們沒有解決這個具體問題。

解決方案是否也可以擴展到熊貓系列？

來源

2017-07-01 Drj

您不需要pandas。如果你有一個pd.Series

sample = pd.Series(["This is a test text. Test text should pass the test"] * 5) 
approved_list = ["test", "text"]

使用str串訪問

sample.str.findall('|'.join(approved_list), re.IGNORECASE) 

0 [test, text, Test, text, test] 
1 [test, text, Test, text, test] 
2 [test, text, Test, text, test] 
3 [test, text, Test, text, test] 
4 [test, text, Test, text, test] 
dtype: object

來源

2017-07-01 22:34:16 piRSquared

由於使用正則表達式模塊re

import re re.findall('|'.join(approved_list), sample, re.IGNORECASE) ['test', 'text', 'Test', 'text', 'test']

，這是有幫助的。我之所以提到熊貓，是因爲'approved_list'需要應用到'pd.Series'的每個值上。你有什麼建議嗎？ – Drj

@Drj更新了我的文章。 – piRSquared

非匹配詞在python中刪除

回答

相關問題