2013-04-28 55 views
1

使用python,我想「教育」純文本輸入的引號並將它們轉換爲上下文語法。這裏是(遞歸)例如:解析文本以替換引號和嵌套引號

原文:

Using python, I would like "educate" quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example: 

輸出:

Using python, I would like \quotation{educate} quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example: 

我想它來處理嵌套的報價,以及:

原文:

Original text: "Using python, I would like 'educate' quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example:" 

輸出:

Original text: \quotation {Using python, I would like \quotation{educate} quotes of 
a plain text input and turn them into the Context syntax. 
Here is a (recursive) example:} 

當然,我應該照顧的邊緣情況,例如:

She said "It looks like we are back in the '90s" 

上下文規格報價是在這裏:

http://wiki.contextgarden.net/Nested_quotations#Nested_quotations_in_MkIV

是什麼對這種情況最敏感的方法?非常感謝你!

回答

3

這一個嵌套引號的作品,雖然它不處理你的優勢的情況下

def quote(string): 
    text = '' 
    stack = [] 
    for token in iter_tokes(string): 
     if is_quote(token): 
      if stack and stack[-1] == token: # closing 
       text += '}' 
       stack.pop() 
      else: # opening 
       text += '\\quotation{' 
       stack.append(token) 
     else: 
      text += token 
    return text 

def iter_tokes(string): 
    i = find_quote(string) 
    if i is None: 
     yield string 
    else: 
     if i > 0: 
      yield string[:i] 
     yield string[i] 
     for q in iter_tokes(string[i+1:]): 
      yield q 

def find_quote(string): 
    for i, char in enumerate(string): 
     if is_quote(char): 
      return i 
    return None 

def is_quote(char): 
    return char in '\'\"' 

def main(): 
    quoted = None 
    with open('input.txt') as fh: 
     quoted = quote(fh.read()) 
    print quoted 

main() 
+0

謝謝,這是一個非常好的模板。我需要收集一些邊緣案例以用真實情況進行測試,並相應地進行調整。 – Alex 2013-04-28 16:01:12

0

如果你確定原文在正確的地方有空間,你可以簡單地使用正則表達式:

regexp = re.compile('(?P<opening>(?:^|(?<=\\s))\'(?!\\d0s)|(?<=\\s)")|["\'](?=\\s|$)') 

def repl(match): 
    if match.group('opening'): 
     return '\\quotation{' 
    else: 
     return '}' 

result = re.sub(regexp, repl, s) 
+0

感謝。我將無法控制輸入,所以我更喜歡更通用的解決方案。 – Alex 2013-04-28 15:57:06