2012-10-03 20 views
2

我通過topicmodels教程R.去12頁左右,他們剝去HTML標籤和希臘字母:「XML內容似乎並不爲XML」:錯誤xmlTreeParse R中

R> library("XML") 
R> remove_HTML_markup <- function(s) { 
+ doc <- htmlTreeParse(s, asText = TRUE, trim = FALSE) 
+ xmlValue(xmlRoot(doc)) 
+ } 
R> remove_HTML_markup(JSS_papers[1,"description"]) 
Error: XML content does not seem to be XML, nor to identify a file name ... 

JSS_papers店與從期刊下載的論文集相關的元數據。 description標記下的條目是文章的摘要。這個沒有任何標籤:

JSS_papers[1,"description"] = "The fit of a variogram model to spatially-distributed 
    data is often difficult to assess. A graphical diagnostic written in S-plus is 
    introduced that allows the user to determine both the general quality of the fit of a 
    variogram model, and to find specific pairs of locations that do not have measurements 
    that are consonant with the fitted variogram. It can help identify nonstationarity,  
    outliers, and poor variogram fit in general. Simulated data sets and a set of soil  
    nitrogen concentration data are examined using this graphical diagnostic." 
+0

它適合我。你可以發佈你的'sessionInfo()'嗎? – nograpes

回答

0

我最近有同樣的問題。我用URL分配的變量有一個錯字。仔細檢查你的變量s,看看有沒有錯。