2016-09-19 66 views
0

我有一個非常相似的情況下,這一個(Load XML to Dataframe in R with parent node attributes),在那裏我試圖XML轉換爲DF不存在的節點處理,但我無法處理非現有節點「sp」和「l」。 (我不在乎節點「m」)。 假設我的XML看起來是這樣的:與R中的XML數據幀

<text> 
<body> 
<div1 type="scene1」 n="1"> 
<sp who="fau"> 
    <l c="30" a="Settle thy studies"/> 
    <m x="40" b="To sound the depth of that thou wilt profess"/> 
</sp> 
<sp who="eang"> 
     <m x="105" b="Go forward, Faustus, in that famous art"/> 
</sp> 
</div1> 
<div1 type="scene2」 n="2"> 
<sp who="fau"> 
    <l c="31" a="Settle thy"/> 
    <m x="50" b="To sound the depth of"/> 
</sp> 
<sp who="fau"> 
    <l c="32" a="Settle"/> 
    <m x="60" b="To sound the"/> 
</sp> 
<sp who="fau"> 
    <l c="33" a="Settle thy studies, Faustus"/> 
    <m x="40" b="To sound the depth of that thou wilt"/> 
</sp> 
</div1> 
<div1 type="scene3」 n="3"> 
</div1> 
<div1 type="scene4」 n="4"> 
</div1> 
<div1 type="scene5」 n="5"> 
</div1> 
</body> 
</text> 

這是我想什麼來獲得:

n type  lc  la 
1 scene1 30  Settle thy studies 
2 scene2 31  Settle thy 
2 scene2 32  Settle 
2 scene2 33  Settle thy studies, Faustus 
3 scene3 NA  NA  
4 scene4 NA  NA 
5 scene5 NA  NA 

我已經試過這樣:

doc = xmlTreeParse("play.xml", useInternal = TRUE) 

bodyToDF <- function(x){ 
n <- xmlGetAttr(x, "n") 
type <- xmlGetAttr(x, "type") 
sp <- xpathApply(x, 'sp', function(sp) { 
if(is.null(sp)) { 
    lc <- NA 
    la <- NA 
} 
lc <- xpathSApply(sp, 'l', function(l) { xmlGetAttr(l,"c")}) 
la = xpathSApply(sp, 'l', function(l) { xmlValue(l,"a")}) 
data.frame(n, type, lc, la) 
}) 
do.call(rbind, sp) 
} 


res <- xpathApply(doc, '//div1', bodyToDF) 

,但它不工作:

Error in data.frame(n, type, lc, la) : 
arguments imply differing number of rows: 1, 0 

也這樣:

div1 = sapply(c("n","type"), function(x) xpathSApply(doc, "//div1", xmlGetAttr, x), simplify=FALSE) 

l = sapply(c("c","a"), function(x) xpathSApply(doc, "//l", xmlGetAttr, x), simplify=FALSE) 

df <- data.frame(div1,l) 

,但我似乎無法得到節點和DF行之間的正確匹配:

Error in data.frame(div1, l) : 
arguments imply differing number of rows: 5, 4 

任何想法?謝謝。

+0

Flick的解決方案可能有所幫助http://stackoverflow.com/questions/25346430/dealing-with-empty-xml-nodes-in-r –

+0

@ Hack-R感謝指針,但它也沒有似乎工作: 'do.call(rbind,lapply(xmlChildren(xmlRoot(doc)),function(x){data_frame( n = xmlGetNodeAttr(x,「./div1","n」, NA), type = xmlGetNodeAttr(x,「./div1","type",NA), lc = xmlGetNodeAttr(x,」./sp/l","c",NA), la = xmlGetNodeAttr x「,./sp/l","a",NA) ) }))' 'n type lc la body.1 1 scene1 NA N甲 body.2 2 SCENE2 NA NA body.3 3 scene3 NA NA body.4 4 scene4 NA NA body.5 5添加標題SCENE5 NA NA' – cmvdi01

回答

0

你粘貼XML文本有問題(有些雙引號不是普通的雙引號),所以這裏的人一個很好的版本吧:

txt <- '<text> 
    <body> 
     <div1 type="scene1" n="1"> 
      <sp who="fau"> 
       <l c="30" a="Settle thy studies"/> 
       <m x="40" b="To sound the depth of that thou wilt profess"/> 
      </sp> 
      <sp who="eang"> 
       <m x="105" b="Go forward, Faustus, in that famous art"/> 
      </sp> 
     </div1> 
     <div1 type="scene2" n="2"> 
      <sp who="fau"> 
       <l c="31" a="Settle thy"/> 
       <m x="50" b="To sound the depth of"/> 
      </sp> 
      <sp who="fau"> 
       <l c="32" a="Settle"/> 
       <m x="60" b="To sound the"/> 
      </sp> 
      <sp who="fau"> 
       <l c="33" a="Settle thy studies, Faustus"/> 
       <m x="40" b="To sound the depth of that thou wilt"/> 
      </sp> 
     </div1> 
     <div1 type="scene3" n="3"></div1> 
     <div1 type="scene4" n="4"></div1> 
     <div1 type="scene5" n="5"></div1> 
    </body> 
</text>' 

下可以被轉換回XML語法,如果確有必要,但這個想法是相似的,你需要檢查每一個「場景」節點和處理,如果它發生了缺失值用例其他答案:

library(xml2) 
library(purrr) 
library(dplyr) 

doc <- read_xml(txt) 

xml_find_all(doc, ".//*[contains(@type, 'scene')]") %>% 
    map_df(function(x) { 

    scene <- xml_attr(x, "type") 
    num <- xml_attr(x, "n") 

    lines <- xml_find_all(x, ".//l") 

    if (length(lines) == 0) { 
     data_frame(n=num, scene=scene, lc=NA, la=NA) 
    } else { 
     map_df(lines, function(y) { 
     lc <- xml_attr(y, "c") %||% NA 
     la <- xml_attr(y, "a") %||% NA 
     data_frame(n=num, scene=scene, lc=lc, la=la) 
     }) 
    } 

    }) 

而且,這給你你想要的輸出:

## # A tibble: 7 × 4 
##  n scene lc       la 
## <chr> <chr> <chr>      <chr> 
## 1  1 scene1 30   Settle thy studies 
## 2  2 scene2 31     Settle thy 
## 3  2 scene2 32      Settle 
## 4  2 scene2 33 Settle thy studies, Faustus 
## 5  3 scene3 <NA>      <NA> 
## 6  4 scene4 <NA>      <NA> 
## 7  5 scene5 <NA>      <NA>