2016-05-13 178 views
-3
import re 
def multiwordReplace(text, wordDic): 
    rc = re.compile('|'.join(map(re.escape, wordDic)))) 
    def translate(match): 
     return wordDic[match.group(0)] 
    return rc.sub(translate, text) 

此代碼從另一個源被複制,但我對如何替換文本段落的話不確定,不明白爲什麼「重」的功能在這裏使用這個單詞替換函數是如何工作的?

+1

您應該閱讀[正則表達式](https://docs.python.org/2/howto/regex.html)。 –

+2

我們應該如何處理這些問題?這不是像[正則表達式的意思](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean)的問題,但類似。 –

回答

1

一塊一塊...

# Our dictionary 
wordDic = {'hello': 'foo', 'hi': 'bar', 'hey': 'baz'} 

# Escape every key in dictionary with regular expressions' escape character. 
# Escaping is requred so that possible special characters in 
# dictionary words won't mess up the regex 
map(re.escape, wordDic) 

# join all escaped key elements with pipe | to make a string 'hello|hi|hey' 
'|'.join(map(re.escape, wordDic)) 

# Make a regular expressions instance with given string. 
# the pipe in the string will be interpreted as "OR", 
# so our regex will now try to find "hello" or "hi" or "hey" 
rc = re.compile('|'.join(map(re.escape, wordDic))) 

所以RC現在與匹配的話中有字典和rc.sub替換給定字符串中的那些單詞。當正則表達式返回匹配時,翻譯函數僅返回該鍵的對應值。

1
  1. re.compile() - 將表達式字符串編譯爲正則表達式對象。該字符串由worDic的連接鍵與分隔符|組成。給定一個wordDic{'hello':'hi', 'goodbye': 'bye'}字符串將是「你好|喜」,這可以tranlated爲「Hello 喜」
  2. def translate(match): - 定義將處理每場比賽
  3. rc.sub(translate, text)一個回調函數 - Performes的字符串替換。如果正則表達式匹配文本,則通過回調在wordDic中查找匹配項(實際上是wordDic的鍵),並返回翻譯。

實施例:

wordDic = {'hello':'hi', 'goodbye': 'bye'} 
text = 'hello my friend, I just wanted to say goodbye' 
translated = multiwordReplace(text, wordDic) 
print(translated) 

輸出是:

hi my friend, I just wanted to say bye 

EDIT

使用re.compile()雖然的主要優點是性能增益,如果使用該正則表達式中多次。由於每個函數調用都編譯正則表達式,因此沒有任何收益。如果wordDic被多次使用,您生成一個wordDic功能multiwordReplace和編譯只是做一次:

import re 
def generateMwR(wordDic): 
    rc = re.compile('|'.join(map(re.escape, wordDic))) 
    def f(text): 
     def translate(match): 
      print(match.group(0)) 
      return wordDic[match.group(0)] 
     return rc.sub(translate, text) 
    return f 

用法是這樣的:

wordDic = {'hello': 'hi', 'goodbye': 'bye'} 
text = 'hello my friend, I just wanted to say goodbye' 
f = generateMwR(wordDic) 
translated = f(text)