0
我有一個非常相似的情況下,這一個(Load XML to Dataframe in R with parent node attributes),在那裏我試圖XML轉換爲DF不存在的節點處理,但我無法處理非現有節點「sp」和「l」。 (我不在乎節點「m」)。 假設我的XML看起來是這樣的:與R中的XML數據幀
<text>
<body>
<div1 type="scene1」 n="1">
<sp who="fau">
<l c="30" a="Settle thy studies"/>
<m x="40" b="To sound the depth of that thou wilt profess"/>
</sp>
<sp who="eang">
<m x="105" b="Go forward, Faustus, in that famous art"/>
</sp>
</div1>
<div1 type="scene2」 n="2">
<sp who="fau">
<l c="31" a="Settle thy"/>
<m x="50" b="To sound the depth of"/>
</sp>
<sp who="fau">
<l c="32" a="Settle"/>
<m x="60" b="To sound the"/>
</sp>
<sp who="fau">
<l c="33" a="Settle thy studies, Faustus"/>
<m x="40" b="To sound the depth of that thou wilt"/>
</sp>
</div1>
<div1 type="scene3」 n="3">
</div1>
<div1 type="scene4」 n="4">
</div1>
<div1 type="scene5」 n="5">
</div1>
</body>
</text>
這是我想什麼來獲得:
n type lc la
1 scene1 30 Settle thy studies
2 scene2 31 Settle thy
2 scene2 32 Settle
2 scene2 33 Settle thy studies, Faustus
3 scene3 NA NA
4 scene4 NA NA
5 scene5 NA NA
我已經試過這樣:
doc = xmlTreeParse("play.xml", useInternal = TRUE)
bodyToDF <- function(x){
n <- xmlGetAttr(x, "n")
type <- xmlGetAttr(x, "type")
sp <- xpathApply(x, 'sp', function(sp) {
if(is.null(sp)) {
lc <- NA
la <- NA
}
lc <- xpathSApply(sp, 'l', function(l) { xmlGetAttr(l,"c")})
la = xpathSApply(sp, 'l', function(l) { xmlValue(l,"a")})
data.frame(n, type, lc, la)
})
do.call(rbind, sp)
}
res <- xpathApply(doc, '//div1', bodyToDF)
,但它不工作:
Error in data.frame(n, type, lc, la) :
arguments imply differing number of rows: 1, 0
也這樣:
div1 = sapply(c("n","type"), function(x) xpathSApply(doc, "//div1", xmlGetAttr, x), simplify=FALSE)
l = sapply(c("c","a"), function(x) xpathSApply(doc, "//l", xmlGetAttr, x), simplify=FALSE)
df <- data.frame(div1,l)
,但我似乎無法得到節點和DF行之間的正確匹配:
Error in data.frame(div1, l) :
arguments imply differing number of rows: 5, 4
任何想法?謝謝。
Flick的解決方案可能有所幫助http://stackoverflow.com/questions/25346430/dealing-with-empty-xml-nodes-in-r –
@ Hack-R感謝指針,但它也沒有似乎工作: 'do.call(rbind,lapply(xmlChildren(xmlRoot(doc)),function(x){data_frame( n = xmlGetNodeAttr(x,「./div1","n」, NA), type = xmlGetNodeAttr(x,「./div1","type",NA), lc = xmlGetNodeAttr(x,」./sp/l","c",NA), la = xmlGetNodeAttr x「,./sp/l","a",NA) ) }))' 'n type lc la body.1 1 scene1 NA N甲 body.2 2 SCENE2 NA NA body.3 3 scene3 NA NA body.4 4 scene4 NA NA body.5 5添加標題SCENE5 NA NA' – cmvdi01