使用正則表達式查找單詞上下文

我創建了一個函數來搜索文本中給定單詞（w）的上下文，其中左側和右側是用於記錄單詞數靈活性的參數。使用正則表達式查找單詞上下文

import re 
def get_context (text, w, left, right): 
    text.insert (0, "*START*") 
    text.append ("*END*") 

    all_contexts = [] 

    for i in range(len(text)): 

     if re.match(w,text[i], 0): 

      if i < left: 
       context_left = text[:i] 

      else: 
       context_left = text[i-left:i] 

      if len(text) < (i+right): 
       context_right = text[i:] 

      else: 
       context_right = text[i:(i+right+1)] 

      context = context_left + context_right 

      all_contexts.append(context) 
    return all_contexts

因此，例如，如果一個具有在像這樣的列表的形式的文本：

文本= [ '的Python'， '是'， '動態'， '類型'，'語言'，'Python'， 'functions'，'really'，'care'，'about'，'what'，'you'，'pass'，'to'， 'them'，'but'，'你'，'有'，'它'，''，'錯'，'方式'，'如果'，'你'，'想'，'到'，'通'，'一'，'千' '，'arguments'，'to'，'your'， 'function'，'then'，'you'，'can'，'explicit'，'define'，'every'， 'parameter'，'in '，'你的'，'功能'，'定義'，'和'，'你的'， '功能'，'將'，'是'，'自動'，'能'，'到'，'處理'， 'all'，' 」， '參數'， '你'， '通'， '到'， '他們'， '對'， '你']

的功能，例如工作正常：

get_context(text, "function",2,2) 
[['language', 'python', 'functions', 'really', 'care'], ['to', 'your', 'function', 'then', 'you'], ['in', 'your', 'function', 'definition', 'and'], ['and', 'your', 'function', 'will', 'be']]

現在我想建立的每一個字的文本上下文的字典執行以下操作：

d = {} 
for w in set(text): 
    d[w] = get_context(text,w,2,2)

但我正在逐漸這個錯誤。

Traceback (most recent call last): 
    File "<pyshell#32>", line 2, in <module> 
    d[w] = get_context(text,w,2,2) 
    File "<pyshell#20>", line 9, in get_context 
    if re.match(w,text[i], 0): 
    File "/usr/lib/python3.4/re.py", line 160, in match 
    return _compile(pattern, flags).match(string) 
    File "/usr/lib/python3.4/re.py", line 294, in _compile 
    p = sre_compile.compile(pattern, flags) 
    File "/usr/lib/python3.4/sre_compile.py", line 568, in compile 
    p = sre_parse.parse(p, flags) 
    File "/usr/lib/python3.4/sre_parse.py", line 760, in parse 
    p = _parse_sub(source, pattern, 0) 
    File "/usr/lib/python3.4/sre_parse.py", line 370, in _parse_sub 
    itemsappend(_parse(source, state)) 
    File "/usr/lib/python3.4/sre_parse.py", line 579, in _parse 
    raise error("nothing to repeat") 
sre_constants.error: nothing to repeat

我不明白這個錯誤。誰能幫我這個？

來源

2016-05-13 Wunter

問題是「* START *」和「* END *」被解釋爲正則表達式。另外請注意，在函數開始處插入「* START *」和「* END *」text會導致問題。你應該只做一次。

這裏的工作代碼完整版：

import re 

def get_context(text, w, left, right): 
    all_contexts = [] 
    for i in range(len(text)): 
     if re.match(w,text[i], 0): 
      if i < left: 
       context_left = text[:i] 
      else: 
       context_left = text[i-left:i] 
      if len(text) < (i+right): 
       context_right = text[i:] 
      else: 
       context_right = text[i:(i+right+1)] 
      context = context_left + context_right 
      all_contexts.append(context) 
    return all_contexts 

text = ['Python', 'is', 'dynamically', 'typed', 'language', 
     'Python', 'functions', 'really', 'care', 'about', 'what', 
     'you', 'pass', 'to', 'them', 'but', 'you', 'got', 'it', 'the', 
     'wrong', 'way', 'if', 'you', 'want', 'to', 'pass', 'one', 
     'thousand', 'arguments', 'to', 'your', 'function', 'then', 
     'you', 'can', 'explicitly', 'define', 'every', 'parameter', 
     'in', 'your', 'function', 'definition', 'and', 'your', 
     'function', 'will', 'be', 'automagically', 'able', 'to', 'handle', 
     'all', 'the', 'arguments', 'you', 'pass', 'to', 'them', 'for', 'you'] 

text.insert(0, "START") 
text.append("END") 

d = {} 
for w in set(text): 
    d[w] = get_context(text,w,2,2)

也許你可以用w == text[i]取代re.match(w,text[i], 0)。

來源

2016-05-13 19:10:08 malbarbo

好吧，這就是問題所在。我沒有想到這兩個* START *和* END *。我想到了==文本[我]，但我想知道爲什麼這不起作用。謝謝 – Wunter

text中至少有一個元素包含正則表達式中特殊的字符。如果你只是想查找的單詞是否是字符串中，只需用str.startswith，即

if text[i].startswith(w): # instead of re.match(w,text[i], 0):

但我不明白爲什麼你反正檢查爲，而不是平等。

來源

2016-05-13 19:01:00 L3viathan

我認爲使用're.match'會增加一些靈活性，例如在匹配'functions？'的同時尋找函數和函數。感謝您的建議 – Wunter

的整個東西可以重新寫得很簡潔如下，

text = 'Python is dynamically typed language Python functions really care about what you pass to them but you got it the wrong way if you want to pass one thousand arguments to your function then you can explicitly define every parameter in your function definition and your function will be automagically able to handle all the arguments you pass to them for you'

保持它str，假設context = 'function',

pat = re.compile(r'(\w+\s\w+\s)functions?(?=(\s\w+\s\w+))') 
pat.findall(text) 
[('language Python ', ' really care'), 
('to your ', ' then you'), 
('in your ', ' definition and'), 
('and your ', ' will be')]

現在，少量的定製將需要在正則表達式允許，像說的話，functional或functioning不僅function或functions。但重要的想法是廢除索引和更多的功能。

請注意，如果這不適合你，當你批量應用它。

來源

2016-05-13 19:22:44

我認爲如果我想改變單詞的左側和右側，處理列表會更容易。我想過使用正則表達式，但我想不出設置雙方字數的方法。感謝您的建議 – Wunter

@Wunter如果您使用'list'，請務必知道'inset'和'+'是紅旗。他們很慢。 'append'沒問題。 –

感謝您的諮詢。我開始學習編程。我會牢記在心:) – Wunter

使用正則表達式查找單詞上下文

回答

相關問題