Haskell HXT提取值列表

我試圖通過XPath和箭頭在同一時間通過HXT我的方式，我完全卡在如何思考這個問題。我有下面的HTML：Haskell HXT提取值列表

<div> 
<div class="c1">a</div> 
<div class="c2">b</div> 
<div class="c3">123</div> 
<div class="c4">234</div> 
</div>

我已經提取到一個HXT XmlTree。我想要做的就是定義一個函數（我認爲）：

getValues :: [String] -> IOSArrow Xmltree [(String, String)]

其中，如果用作getValues ["c1", "c2", "c3", "c4"]，會讓我：

[("c1", "a"), ("c2", "b"), ("c3", "123"), ("c4", "234")]

幫助嗎？

來源

2010-10-09 Muchin

這裏有一個方法（我的類型是有點更普遍的，我不使用XPath）：

{-# LANGUAGE Arrows #-} 
module Main where 

import qualified Data.Map as M 
import Text.XML.HXT.Arrow 

classes :: (ArrowXml a) => a XmlTree (M.Map String String) 
classes = listA (divs >>> divs >>> pairs) >>> arr M.fromList 
    where 
    divs = getChildren >>> hasName "div" 
    pairs = proc div -> do 
     cls <- getAttrValue "class" -< div 
     val <- deep getText   -< div 
     returnA -< (cls, val) 

getValues :: (ArrowXml a) => [String] -> a XmlTree [(String, Maybe String)] 
getValues cs = classes >>> arr (zip cs . lookupValues cs) 
    where lookupValues cs m = map (flip M.lookup m) cs 

main = do 
    let xml = "<div><div class='c1'>a</div><div class='c2'>b</div>\ 
      \<div class='c3'>123</div><div class='c4'>234</div></div>" 

    print =<< runX (readString [] xml >>> getValues ["c1", "c2", "c3", "c4"])

，我可能會運行一個箭頭拿到地圖，然後做查詢，但這樣一來也適用。

要回答你的問題有關listA：divs >>> divs >>> pairs是a XmlTree (String, String)型-i.e.列表箭頭，這是一個非確定性的計算是將XML樹並返回字符串對。

arr M.fromList有類型a [(String, String)] (M.Map String String)。這意味着我們不能只用divs >>> divs >>> pairs來編寫它，因爲類型不匹配。

listA解決了這個問題：它崩潰divs >>> divs >>> pairs成確定性版本a XmlTree [(String, String)]類型，這正是我們需要的。

來源

2010-10-09 19:21:39

listA是做什麼的？ – Muchin 2010-10-10 00:37:55

這是一種使用HandsomeSoup做到這一點：

-- For the join function. 
import Data.String.Utils 
import Text.HandsomeSoup 
import Text.XML.HXT.Core 

-- Of each element, get class attribute and text. 
getItem = (this ! "class" &&& (this /> getText)) 
getItems selectors = css (join "," selectors) >>> getItem 

main = do 
    let selectors = [".c1", ".c2", ".c3", ".c4"] 
    items <- runX (readDocument [] "data.html" >>> getItems selectors) 
    print items

data.html是HTML文件。

來源

2012-11-04 00:48:59

Haskell HXT提取值列表

回答

相關問題