查詢詞的位置

我目前正在使用Python並使用NLTK來提取我的數據的特徵。我想提取的一個特徵是一個句子中特定查詢詞的位置。要做到這一點，我想查詢詞的位置

String.find(word)

但它給了我比文本字的總數還多的話。

請告訴我一些方法來查找單詞中特定單詞的位置。

例如「今天是我的生日」單詞生日的位置是4.如何去做？

來源

2016-05-15 SmartF

string = 'Today is my birthday' 
string.find('my') #Out: 9 
string[9:] #Out: 'my birthday'

find不按字詞而是按字符搜索字符串。對於簡單的例子，你可以這樣做（注意，這是零索引）：

words = string.split() 
words.index('my') #Out: 2

編輯

如果您需要的不只是由空格分隔字符串單詞的更復雜的定義，你可以使用普通的expresions。下面是一個簡單的例子：

import re 
word_re = re.compile('\w+') 
words = map(lambda match: match.group(0), word_re.finditer(string)) 
words.index('my') #Out: 2

EDIT2

try: 
    words.index('earthquake') 
except ValueError: 
    print 'handle missing word here'

來源

2016-05-15 09:30:12 tavo

時我應用拆分它給了我錯誤 **回溯（最近一次通話最後）：文件「C：\ Users \ user \ workspace \ test1 \ te st1 \ final.py「，第36行，在 fdist2 = fdist1.split（」earthquake「） AttributeError：'list'對象沒有屬性'split'** – SmartF

什麼是fdlist1？在您的原始字符串上使用分割。然後對該分割的結果使用索引。 – tavo

另外拆分是用來拆分字符串的空白。索引是用於查找單詞列表中的特定單詞的方法。 – tavo

您可以重新使用或NLTK transfert文本字符串列表，之後你可以搜索世界：

import re 
text = "Today is my birthday" 
word = "birthday" 
words1 = re.sub("[^\w]", " ", text).split() # using re 

import nltk 
words2 = nltk.word_tokenize(text) # using nltk 

position = 1 
for str in words1 :# or for str in words2 : 
    if str == word: 
     print position 
    position += 1

來源

2016-05-15 09:34:50

因爲你不能在'+'和'='之間插入空格，所以你需要將你的代碼的最後一行'position + = 1'改爲'position + = 1'添加AND'分配操作符。它會導致語法錯誤 – Tanu

謝謝Tanu，我糾正了它 –

謝謝khelili hamza我現在得到它 – SmartF

查詢詞的位置

回答

相關問題