2017-09-02 194 views
0

R中的包似乎無法在將XML轉換爲JSON時正常工作。我用'XML'軟件包試了RJSONIO,rjson和jsonlite。我首先解析XML並使用XML :: xmlToList()將其轉換爲列表,然後使用這3個包中的toJSON()將它們轉換爲JSON。XML轉換爲JSON R

我的XML文件:

<?xml version="1.0" encoding="utf-8"?> 
<votes> 
    <row Id="1" PostId="1" VoteTypeId="2" CreationDate="2014-05-13T00:00:00.000" /> 
    <row Id="2" PostId="1" VoteTypeId="2" CreationDate="2014-05-13T00:00:00.000" /> 
    <row Id="3" PostId="3" VoteTypeId="2" CreationDate="2014-05-13T00:00:00.000" /> 
</votes> 

我的源代碼:

library(XML) 
library(RJSONIO) 
library(rjson) 
library(jsonlite) 

xml_parse <- xmlTreeParse("~/Downloads/test.xml", useInternalNodes=TRUE) 
xml_root <- xmlRoot(xml_parse) 
xml_list <- xmlToList(xml_root, simplify = TRUE) 

#jsonlite package 
xml_jsonlite <- jsonlite::toJSON(xml_list) 
write(xml_jsonlite, "test_jsonlite.json") 

#RJSONIO package 
xml_rjsonio <- RJSONIO::toJSON(xml_list) 
write(xml_rjsonio, "test_rjsonio.json") 

#rjson package 
xml_rjson <- RJSONIO::toJSON(xml_list) 
write(xml_rjson, "test_rjson.json") 

轉換JSON從RJSONIO文件:

{ 
"row": { 
    "Id": "98", 
    "PostId": "10", 
    "VoteTypeId": "2", 
    "CreationDate": "2014-05-14T00:00:00.000" 
}, 
"row": { 
    "Id": "99", 
    "PostId": "7", 
    "VoteTypeId": "5", 
    "UserId": "111", 
    "CreationDate": "2014-05-14T00:00:00.000" 
} 
} 

這是因爲重複的字段名稱顯然是錯誤的。從jsonlite

轉換JSON文件:

{"row":["1","1","2","2014-05-13T00:00:00.000"], 
"row.1":["2","1","2","2014-05-13T00:00:00.000"], 
"row.2":["3","3","2","2014-05-13T00:00:00.000"]} 

這是奇怪,因爲應該只有一個字段名「行」與子文檔,而不是增加「行」的數組的數組。它甚至沒有字段名稱。從rjson

轉換JSON文件:

{ 
"row": { 
"Id": "1", 
"PostId": "1", 
"VoteTypeId": "2", 
"CreationDate": "2014-05-13T00:00:00.000" 
}, 
"row": { 
"Id": "2", 
"PostId": "1", 
"VoteTypeId": "2", 
"CreationDate": "2014-05-13T00:00:00.000" 
} 
} 

理想的JSON文件將是這樣:

{"votes" : { 
    "row" : [ 
     { 
      "Id" : "1", 
      "PostId" : "1", 
      "VoteTypeId" : "2", 
      "CreationDate" : "2014-05-13T00:00:00.000" 
     }, 
     { 
      "Id" : "2", 
      "PostId" : "1", 
      "VoteTypeId" : "2", 
      "CreationDate" : "2014-05-13T00:00:00.000" 
     } 
     ] 
     } 
} 

尋找解決方案。任何幫助表示讚賞。

+0

顯示您用於獲取錯誤的JSON字符串的代碼以及XML轉換中出現的數據,以便我們可以幫助您正確理解它。 – sconfluentus

回答

2

xml2jsonlite獲得了大多數的方式出現,但你還沒有告訴我們,你知道假設R代碼單獨真的嘗試了一個解決方案,所以這裏的發佈,因此它可以幫助別人的部分解決方案:

library(xml2) 
library(jsonlite) 

read_xml('<?xml version="1.0" encoding="utf-8"?> 
<votes> 
    <row Id="1" PostId="1" VoteTypeId="2" CreationDate="2014-05-13T00:00:00.000" /> 
    <row Id="2" PostId="1" VoteTypeId="2" CreationDate="2014-05-13T00:00:00.000" /> 
    <row Id="3" PostId="3" VoteTypeId="2" CreationDate="2014-05-13T00:00:00.000" /> 
</votes>') -> doc 

x <- xml2::as_list(doc) 

xl <- lapply(x, attributes) 

toJSON(xl, pretty = TRUE, auto_unbox = TRUE) 
## { 
## "row": { 
##  "Id": "1", 
##  "PostId": "1", 
##  "VoteTypeId": "2", 
##  "CreationDate": "2014-05-13T00:00:00.000" 
## }, 
## "row.1": { 
##  "Id": "2", 
##  "PostId": "1", 
##  "VoteTypeId": "2", 
##  "CreationDate": "2014-05-13T00:00:00.000" 
## }, 
## "row.2": { 
##  "Id": "3", 
##  "PostId": "3", 
##  "VoteTypeId": "2", 
##  "CreationDate": "2014-05-13T00:00:00.000" 
## } 
## } 

每您的評論

你需要的是數據而不是如何構成的。這意味着如果你想要的東西,你不能使用罐裝,香草的公用事業。

xml_find_all(doc, "//votes/row") %>% 
    map_chr(~{ 
    toJSON(as.list(xml_attrs(.x)), auto_unbox = TRUE, pretty = TRUE) 
    }) %>% 
    paste0(collapse=",\n") %>% 
    gsub("[\n]", "\n ", .) %>% 
    sprintf('{ "votes" : {\n row" : [\n %s]\n }\n}', .) %>% 
    cat() 

## { "votes" : { 
## row" : [ 
##  { 
##  "Id": "1", 
##  "PostId": "1", 
##  "VoteTypeId": "2", 
##  "CreationDate": "2014-05-13T00:00:00.000" 
##  }, 
##  { 
##  "Id": "2", 
##  "PostId": "1", 
##  "VoteTypeId": "2", 
##  "CreationDate": "2014-05-13T00:00:00.000" 
##  }, 
##  { 
##  "Id": "3", 
##  "PostId": "3", 
##  "VoteTypeId": "2", 
##  "CreationDate": "2014-05-13T00:00:00.000" 
##  }] 
## } 
## } 
+0

感謝@hrbrmstr的回答。我已經嘗試jsonlite,就像我在我的文章中所說的那樣,輸出結果並不理想,因爲您可以在這裏看到,理想的輸出將是一個字段名稱「row」,其中包含子文檔數組而不是多個增量「row」單個文件。 –