2017-08-10 43 views
1

所以我需要一種簡單的方法從段落中的搜索詞前後拉十個單詞,並將其全部提取到一個句子中。如何在python中圍繞特定單詞拉出多個單詞?

例如:

段落=「的家犬(家犬或家犬)是形成狼狀犬科動物的一部分屬犬(犬科動物)的成員,並且是最廣泛豐富的食肉動物。狗和現存的灰狼是姊妹分類羣,現代狼與先馴化的狼沒有密切關係,這意味着狗的直系祖先已經滅絕。這隻狗是第一個馴養的品種,已經有數千年的選擇性繁殖,用於各種行爲,感官能力和身體屬性。「

輸入

輸出

最廣泛豐富的食肉動物。狗和現存的灰太狼是姐妹羣,與現代狼沒有發現目標字的位置後密切相關

回答

0

您可以嘗試使用字符串。你到目前爲止試過編碼嗎?

4
​​

輸出:

most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to 
+0

當然這很簡單,你是g遇到大量文本的性能問題。 – WombatPM

2

這是正則表達式,可以幫助您提取所需文本:

(?:[^ ]+){0,10}wolf(?: [^ ]+){0,10} 

也是一個Python的例子應該像,雖然我不能現在測試它:

import re 

t = "The domestic dog (Canis lupus familiaris or Canis familiaris) is a member of genus Canis (canines) that forms part of the wolf-like canids, and is the most widely abundant carnivore. The dog and the extant gray wolf are sister taxa, with modern wolves not closely related to the wolves that were first domesticated, which implies that the direct ancestor of the dog is extinct. The dog was the first domesticated species and has been selectively bred over millennia for various behaviors, sensory capabilities, and physical attributes" 

m = re.search("(?:[^ ]+){0,10}wolf\s(?:[^ ]+){0,10}", t) 

if m: 
    print (m.group(0)) 
相關問題