使用pyparsing來解析一個單詞escape-split在多行

我試圖解析可以使用pyparsing用反斜槓 - 換行符組合（「\\n」）拆分多行的單詞。下面是我做了什麼：使用pyparsing來解析一個單詞escape-split在多行

from pyparsing import * 

continued_ending = Literal('\\') + lineEnd 
word = Word(alphas) 
split_word = word + Suppress(continued_ending) 
multi_line_word = Forward() 
multi_line_word << (word | (split_word + multi_line_word)) 

print multi_line_word.parseString(
'''super\\ 
cali\\ 
fragi\\ 
listic''')

我得到的輸出是['super']，而預期輸出爲['super', 'cali', fragi', 'listic']。更妙的是所有的人都加入了一個單詞（我認爲我可以做multi_line_word.parseAction(lambda t: ''.join(t))。

我試着在pyparsing helper看着這個代碼，但它給了我一個錯誤，maximum recursion depth exceeded。

編輯2009-11-15：我意識到晚些時候pyparsing變得有點慷慨的問候空白，並導致一些可憐的假設，我以爲我是爲解析了很多寬鬆也就是說，我們要看到的任何字，逸出的部分之間和EOL字符沒有空白。

我意識到上面的小字符串作爲測試用例是不夠的，所以我寫了下面的單元測試。可以通過這些測試應該能夠匹配什麼，我直覺地認爲，作爲一個逃生分割字—和僅逃生分割字碼。他們不會匹配一個基本單詞，而不是逃避分裂。我們可以—，我相信應該—使用不同的語法結構。這使得它們兩個完全整齊。

import unittest 
import pyparsing 

# Assumes you named your module 'multiline.py' 
import multiline 

class MultiLineTests(unittest.TestCase): 

    def test_continued_ending(self): 

     case = '\\\n' 
     expected = ['\\', '\n'] 
     result = multiline.continued_ending.parseString(case).asList() 
     self.assertEqual(result, expected) 


    def test_continued_ending_space_between_parse_error(self): 

     case = '\\ \n' 
     self.assertRaises(
      pyparsing.ParseException, 
      multiline.continued_ending.parseString, 
      case 
     ) 


    def test_split_word(self): 

     cases = ('shiny\\', 'shiny\\\n', ' shiny\\') 
     expected = ['shiny'] 
     for case in cases: 
      result = multiline.split_word.parseString(case).asList() 
      self.assertEqual(result, expected) 


    def test_split_word_no_escape_parse_error(self): 

     case = 'shiny' 
     self.assertRaises(
      pyparsing.ParseException, 
      multiline.split_word.parseString, 
      case 
     ) 


    def test_split_word_space_parse_error(self): 

     cases = ('shiny \\', 'shiny\r\\', 'shiny\t\\', 'shiny\\ ') 
     for case in cases: 
      self.assertRaises(
       pyparsing.ParseException, 
       multiline.split_word.parseString, 
       case 
      ) 


    def test_multi_line_word(self): 

     cases = (
       'shiny\\', 
       'shi\\\nny', 
       'sh\\\ni\\\nny\\\n', 
       ' shi\\\nny\\', 
       'shi\\\nny ' 
       'shi\\\nny captain' 
     ) 
     expected = ['shiny'] 
     for case in cases: 
      result = multiline.multi_line_word.parseString(case).asList() 
      self.assertEqual(result, expected) 


    def test_multi_line_word_spaces_parse_error(self): 

     cases = (
       'shi \\\nny', 
       'shi\\ \nny', 
       'sh\\\n iny', 
       'shi\\\n\tny', 
     ) 
     for case in cases: 
      self.assertRaises(
       pyparsing.ParseException, 
       multiline.multi_line_word.parseString, 
       case 
      ) 


if __name__ == '__main__': 
    unittest.main()

來源

2009-11-14 gotgenes

閒逛的多一點之後，我來到this help thread那裏有這明顯的一點

我經常看到低效的語法時有人直接從BNF定義實現pyparsing語法。 BNF 不具有「一個或多個」概念或「零個或多個」或「可選」 ......

就這樣，我得到了主意，改變這兩條線

multi_line_word = Forward() 
multi_line_word << (word | (split_word + multi_line_word))

要

multi_line_word = ZeroOrMore(split_word) + word

這也得到了輸出什麼，我一直在尋找：['super', 'cali', fragi', 'listic']。

接下來，我添加了一個解析的行動，將參加這些令牌一起：

multi_line_word.setParseAction(lambda t: ''.join(t))

這給出了['supercalifragilistic']最終輸出。

我學到的帶回家的消息是一個不只是walk into Mordor。

只是在開玩笑。

的帶回家的消息是，不能簡單地實現BNF與pyparsing一個到一個翻譯。應該調用一些使用迭代類型的技巧。

編輯2009-11-25：爲了補償更艱苦的測試案例，我修改了代碼如下：

no_space = NotAny(White(' \t\r')) 
# make sure that the EOL immediately follows the escape backslash 
continued_ending = Literal('\\') + no_space + lineEnd 
word = Word(alphas) 
# make sure that the escape backslash immediately follows the word 
split_word = word + NotAny(White()) + Suppress(continued_ending) 
multi_line_word = OneOrMore(split_word + NotAny(White())) + Optional(word) 
multi_line_word.setParseAction(lambda t: ''.join(t))

這樣做，使得沒有任何空間來任意之間的利益的元素（除了反斜線後的換行符之外）。

來源

2009-11-15 04:10:18 gotgenes

使用'Combine'也不會強制介入空格。 – PaulMcG 2009-11-16 06:24:46

有趣。嘗試過 'multi_line_word = Combine（Combine（OneOrMore（split_word））+ Optional（word））' 但是它在'sh \\\ n iny''情況下破壞了，因爲它不會引發異常，而是返回'['sh']'。我錯過了什麼嗎？ – gotgenes 2009-11-16 20:04:49

那麼，你的話不僅僅是字母跨越一個'\' - 換行符，但在字母'i'之前那裏有空格，這被視爲分詞符號，所以Combine在'sh'後面停下來。你*可以*修改與相鄰的= False構造函數參數的組合，但要注意 - 你可能最終將整個文件作爲一個單詞吸引！或者，您可以重新定義您的continue_ending的定義，以便在lineEnd後包含任何空格，如果您還想摺疊任何前導空格。 – PaulMcG 2009-11-17 01:56:25

您與您的代碼非常接近。這些MODS的將工作：

# '|' means MatchFirst, so you had a left-recursive expression 
# reversing the order of the alternatives makes this work 
multi_line_word << ((split_word + multi_line_word) | word) 

# '^' means Or/MatchLongest, but beware using this inside a Forward 
multi_line_word << (word^(split_word + multi_line_word)) 

# an unusual use of delimitedList, but it works 
multi_line_word = delimitedList(word, continued_ending) 

# in place of your parse action, you can wrap in a Combine 
multi_line_word = Combine(delimitedList(word, continued_ending))

，正如你在pyparsing谷歌上搜索發現，BNF-> pyparsing翻譯應該用特殊的視圖進行到位的BNF，嗯，缺點使用pyparsing功能。實際上，我正在編寫一個更長的答案，涉及更多的BNF翻譯問題，但您已經找到了這個材料（在wiki上，我假設）。

來源

2009-11-15 16:51:08 PaulMcG

使用pyparsing來解析一個單詞escape-split在多行

回答

相關問題