2016-07-05 52 views
3

假設有一個數據結構代表裏面有註釋的文本。attoparsec中的條件性前瞻

data TWC 
    = T Text TWC -- text 
    | C Text TWC -- comment 
    | E -- end 
    deriving Show 

因此字符串如

"Text, {-comment-}, and something else" 

可以編碼爲

T "Text, " (C "comment" (T ", and something else" E)) 

解析器評論塊和E是相當簡單:

twcP :: Parser TWC 
twcP = eP <|> cP <|> tP 

cP :: Parser TWC 
cP = do 
    _ <- string "{-" 
    c <- manyTill anyChar (string "-}") 
    rest <- cP <|> tP <|> eP 
    return (C (pack c) rest) 

eP :: Parser TWC 
eP = do 
    endOfInput 
    return E 

實施在這樣一個平凡的方式荷蘭國際集團的文本塊解析器

tP :: Parser TWC 
tP = do 
    t <- many1 anyChar 
    rest <- cP <|> eP 
    return (T (pack t) rest) 

使其消耗,因爲

> parseOnly twcP "text{-comment-}" 
Right (T "text{-comment-}" E) 
it ∷ Either String TWC 

所以其貪婪的本性評論部分爲文本,問題是如何表達解析,直到邏輯輸入結束或直到評論部分?換句話說,如何實現有條件的超前解析器?

回答

5

你是對的,有問題的代碼的tP第一線,貪婪地解析文本不停止的評論:

tP = do 
    t <- many1 anyChar 

解決在此之前,我首先要重構代碼一點點引進傭工和使用應用性風格,與隔離到text幫助有問題的代碼:

-- Like manyTill, but pack the result to Text. 
textTill :: Alternative f => f Char -> f b -> f Text 
textTill p end = pack <$> manyTill p end 

-- Parse one comment string 
comment :: Parser Text 
comment = string "{-" *> textTill anyChar (string "-}") 

-- Parse one non-comment text string (problematic implementation) 
text :: Parser Text 
text = pack <$> many1 anyChar 

-- TWC parsers: 

twcP :: Parser TWC 
twcP = eP <|> cP <|> tP 

cP :: Parser TWC 
cP = C <$> comment <*> twcP 

eP :: Parser TWC 
eP = E <$ endOfInput 

tP :: Parser TWC 
tP = T <$> text <*> twcP 

要實現先行,我們可以使用lookAhead組合子,它適用於一個無覆蓋分析器消耗輸入。這使我們能夠text解析直到它到達或者是comment(不消耗它),或endOfInput

-- Parse one non-comment text string (working implementation) 
text :: Parser Text 
text = textTill anyChar (void (lookAhead comment) <|> endOfInput) 

隨着該實現,twcP行爲與預期相同:

ghci> parseOnly twcP "text{-comment-} post" 
Right (T "text" (C "comment" (T " post" E)))