我創建了一個函數來搜索文本中給定單詞(w)的上下文,其中左側和右側是用於記錄單詞數靈活性的參數。使用正則表達式查找單詞上下文
import re
def get_context (text, w, left, right):
text.insert (0, "*START*")
text.append ("*END*")
all_contexts = []
for i in range(len(text)):
if re.match(w,text[i], 0):
if i < left:
context_left = text[:i]
else:
context_left = text[i-left:i]
if len(text) < (i+right):
context_right = text[i:]
else:
context_right = text[i:(i+right+1)]
context = context_left + context_right
all_contexts.append(context)
return all_contexts
因此,例如,如果一個具有在像這樣的列表的形式的文本:
文本= [ '的Python', '是', '動態', '類型','語言','Python', 'functions','really','care','about','what','you','pass','to', 'them','but','你','有','它','','錯','方式','如果','你','想','到','通','一','千' ','arguments','to','your', 'function','then','you','can','explicit','define','every', 'parameter','in ','你的','功能','定義','和','你的', '功能','將','是','自動','能','到','處理', 'all',' 」, '參數', '你', '通', '到', '他們', '對', '你']
的功能,例如工作正常:
get_context(text, "function",2,2)
[['language', 'python', 'functions', 'really', 'care'], ['to', 'your', 'function', 'then', 'you'], ['in', 'your', 'function', 'definition', 'and'], ['and', 'your', 'function', 'will', 'be']]
現在我想建立的每一個字的文本上下文的字典執行以下操作:
d = {}
for w in set(text):
d[w] = get_context(text,w,2,2)
但我正在逐漸這個錯誤。
Traceback (most recent call last):
File "<pyshell#32>", line 2, in <module>
d[w] = get_context(text,w,2,2)
File "<pyshell#20>", line 9, in get_context
if re.match(w,text[i], 0):
File "/usr/lib/python3.4/re.py", line 160, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python3.4/re.py", line 294, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.4/sre_compile.py", line 568, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.4/sre_parse.py", line 760, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.4/sre_parse.py", line 370, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.4/sre_parse.py", line 579, in _parse
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
我不明白這個錯誤。誰能幫我這個?
好吧,這就是問題所在。我沒有想到這兩個* START *和* END *。我想到了==文本[我],但我想知道爲什麼這不起作用。謝謝 – Wunter