2011-08-26 61 views
4

我想從字符串中讀取HTML,處理它並使用HXT將字符串作爲字符串返回。由於此操作不需要IO,我寧願執行箭頭runLA而不是runXHXT:以純代碼讀取和寫入HTML到字符串時的令人驚訝的行爲

的代碼看起來是這樣的(省略爲了簡化處理):

runLA (hread >>> writeDocumentToString [withOutputHTML, withIndent yes]) html 

然而,周邊html標籤在結果​​丟失:

["\n <head>\n <title>Bogus</title>\n </head>\n <body>\n  Some trivial bogus text.\n </body>\n",""] 

當我使用RUNX代替這樣:

runX (readString [] html >>> writeDocumentToString [withOutputHTML, withIndent yes]) 

我得到預期的結果:

["<html>\n <head>\n <title>Bogus</title>\n </head>\n <body>\n  Some trivial bogus text.\n </body>\n</html>\n"] 

這是爲什麼,我該如何解決?

回答

5

如果你看兩者的XmlTree s,你會看到readString增加了一個頂級"/"元素。對於非IOrunLA版本:

> putStr . formatTree show . head $ runLA xread html 
---XTag "html" [] 
    | 
    +---XText "\n " 
    | 
    +---XTag "head" [] 
    ... 

並與runX

> putStr . formatTree show . head =<< runX (readString [] html) 
---XTag "/" [NTree (XAttr "transfer-Status") [NTree (XText "200")... 
    | 
    +---XTag "html" [] 
     | 
     +---XText "\n " 
     | 
     +---XTag "head" [] 
     ... 

writeDocumentToStringgetChildren使用以剝離該根元素。

解決此問題的簡單方法是使用類似selem包裹的xread輸出類似的根元素,以使它看起來像那種輸入writeDocumentToString的預計:

> runLA (selem "/" [xread] >>> writeDocumentToString [withOutputHTML, withIndent yes]) html 
["<html>\n <head>\n <title>Bogus</title>\n </head>\n <body>\n  Some trivial bogus text.\n </body>\n</html>\n"] 

這將產生所需的輸出。