2010-11-11 46 views
2

我最近一直在使用FParsec,並且發現缺乏通用解析器對我來說是一個主要的停止點。我的這個小庫的目標是簡單以及對通用輸入的支持。你能想到任何可以改善這一點的補充,或者是特別糟糕的嗎?這是解析器組合器庫的合理基礎嗎?

 
open LazyList 

type State<'a, 'b> (input:LazyList<'a>, data:'b) = 
    member this.Input = input 
    member this.Data = data 

type Result<'a, 'b, 'c> = 
| Success of 'c * State<'a, 'b> 
| Failure of string * State<'a, 'b> 

type Parser<'a,'b, 'c> = State<'a, 'b> -> Result<'a, 'b, 'c> 

let (>>=) left right state = 
    match left state with 
    | Success (result, state) -> (right result) state 
    | Failure (message, _) -> Result<'a, 'b, 'd>.Failure (message, state) 

let (<|>) left right state = 
    match left state with 
    | Success (_, _) as result -> result 
    | Failure (_, _) -> right state 

let (|>>) parser transform state = 
    match parser state with 
    | Success (result, state) -> Success (transform result, state) 
    | Failure (message, _) -> Failure (message, state) 

let (<?>) parser errorMessage state = 
    match parser state with 
    | Success (_, _) as result -> result 
    | Failure (_, _) -> Failure (errorMessage, state)      

type ParseMonad() = 
    member this.Bind (f, g) = f >>= g 
    member this.Return x s = Success(x, s) 
    member this.Zero() s = Failure("", s)       
    member this.Delay (f:unit -> Parser<_,_,_>) = f() 

let parse = ParseMonad() 

回溯

令人驚訝的是並沒有採取太多的代碼來實現你的描述。這是有點草率,但似乎工作得很好。

let (>>=) left right state = 
    seq { 
     for res in left state do 
      match res with 
      | Success(v, s) -> 
       let v = 
        right v s 
        |> List.tryFind (
         fun res -> 
          match res with 
          | Success (_, _) -> true 
          | _ -> false 
        ) 
       match v with 
       | Some v -> yield v 
       | None ->() 
    } |> Seq.toList 

let (<|>) left right state = 
    left state @ right state 

回溯第2部分

開關周圍的代碼,以使用惰性列表和尾部調用優化遞歸。

let (>>=) left right state = 
    let rec readRight lst = 
     match lst with 
     | Cons (x, xs) -> 
      match x with 
      | Success (r, s) as q -> LazyList.ofList [q]      
      | Failure (m, s) -> readRight xs 
     | Nil -> LazyList.empty<Result<'a, 'b, 'd>> 
    let rec readLeft lst = 
     match lst with 
     | Cons (x, xs) -> 
      match x with 
      | Success (r, s) -> 
       match readRight (right r s) with 
       | Cons (x, xs) -> 
        match x with 
        | Success (r, s) as q -> LazyList.ofList [q]      
        | Failure (m, s) -> readRight xs 
       | Nil -> readLeft xs 
      | Failure (m, s) -> readLeft xs 
     | Nil -> LazyList.empty<Result<'a, 'b, 'd>> 
    readLeft (left state) 

let (<|>) (left:Parser<'a, 'b, 'c>) (right:Parser<'a, 'b, 'c>) state = 
    LazyList.delayed (fun() -> left state) 
    |> LazyList.append 
    <| LazyList.delayed (fun() -> right state) 
+0

說不上來,如果有什麼有用的東西,但也可能http://lorgonblog.wordpress.com/2008/02/看16/monadic-parser-combinators-in-f/ – Brian 2010-11-11 07:08:47

+0

@Brian - 說實話,post是我開始解析器組合體學習體驗的地方。 :) – ChaosPandion 2010-11-11 14:40:15

回答

2

我認爲,你需要做出一個重要的設計決策是你是否想支持您的解析器或不回溯(我不記得多少有關分析理論,但是這可能指定類型您的解析器可以處理的語言)。

回溯。在您的實現中,解析器可能會失敗(Failure大小寫)或只產生一個結果(Success大小寫)。另一種選擇是生成零個或多個結果(例如,表示結果爲seq<'c>)。對不起,如果這是你已經考慮:-),但無論如何...

區別是你的解析器總是遵循第一個可能的選項。例如,如果您編寫如下內容:

let! s1 = (str "ab" <|> str "a") 
let! s2 = str "bcd" 

使用您的實現時,輸入「abcd」會失敗。它將選擇<|>運算符的第一個分支,它將處理前兩個字符,並且序列中的下一個分析器將失敗。基於序列的實現將能夠回溯並跟隨<|>中的第二條路徑並解析輸入。

合併。對我而言,另一個想法是您還可以將Combine成員添加到解析器計算構建器。這有點微妙(因爲您需要了解計算表達式的翻譯方式),但它有時可能有用。如果添加:

member x.Combine(a, b) = a <|> b 
member x.ReturnFrom(p) = p 

然後就可以很好地編寫遞歸解析器:

let rec many p acc = 
    parser { let! r = p     // Parse 'p' at least once 
      return! many p (r::acc)  // Try parsing 'p' multiple times 
      return r::acc |> List.rev } // If fails, return the result 
+0

我只是在想如何使用組合我的優勢。至於回溯,說實話我還沒有考慮過。我一定會加入它並進行實驗。感謝您的建議,一如既往的富有洞察力。 – ChaosPandion 2010-11-11 04:20:39

+0

我剛剛檢查了FParsec實現,他們故意沒有包含'Combine'或'ReturnFrom'方法。 – ChaosPandion 2010-11-11 04:24:18

+0

你是2這個2。我已經實現了基本的回溯,並且隨着對它的更好理解,我會對其進行改進。再次感謝。 – ChaosPandion 2010-11-11 06:55:09

相關問題