我已,我想在R.如何解析XML中的R具有不同數目的子節點和多個節點具有相同名稱的
<?xml version="1.0" encoding="UTF-8"?><CONSOLIDATED_LIST xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://www.un.org/sc/resources/sc-sanctions.xsd" dateGenerated="2016-12-21T19:09:23.456-05:00">
<INDIVIDUALS>
<INDIVIDUAL>
<DATAID>6908434</DATAID>
<VERSIONNUM>1</VERSIONNUM>
<FIRST_NAME>ABD</FIRST_NAME>
<SECOND_NAME>AL-KHALIQ</SECOND_NAME>
<THIRD_NAME> AL-HOUTHI </THIRD_NAME>
<UN_LIST_TYPE>Yemen</UN_LIST_TYPE>
<REFERENCE_NUMBER>YEi.001</REFERENCE_NUMBER>
<LISTED_ON>2014-11-07</LISTED_ON>
<NAME_ORIGINAL_SCRIPT>عبدالخالق الحوثي</NAME_ORIGINAL_SCRIPT>
<COMMENTS1>Gender [Male].</COMMENTS1>
<DESIGNATION>
<VALUE>Huthi military commander</VALUE>
</DESIGNATION>
<NATIONALITY>
<VALUE>Yemen</VALUE>
</NATIONALITY>
<LIST_TYPE>
<VALUE>UN List</VALUE>
</LIST_TYPE>
<LAST_DAY_UPDATED>
<VALUE>2014-11-20</VALUE>
<VALUE>2016-08-26</VALUE>
</LAST_DAY_UPDATED>
<INDIVIDUAL_ALIAS>
<QUALITY>Good</QUALITY>
<ALIAS_NAME>Abd-al-Khaliq al-Huthi</ALIAS_NAME>
</INDIVIDUAL_ALIAS>
<INDIVIDUAL_ALIAS>
<QUALITY>Good</QUALITY>
<ALIAS_NAME>Abd-al-Khaliq Badr-al-Din al Huthi</ALIAS_NAME>
</INDIVIDUAL_ALIAS>
<INDIVIDUAL_ALIAS>
<QUALITY>Good</QUALITY>
<ALIAS_NAME>‘Abd al-Khaliq Badr al-Din al-Huthi</ALIAS_NAME>
</INDIVIDUAL_ALIAS>
<INDIVIDUAL_ALIAS>
<QUALITY>Good</QUALITY>
<ALIAS_NAME>Abd al-Khaliq al-Huthi </ALIAS_NAME>
</INDIVIDUAL_ALIAS>
<INDIVIDUAL_ALIAS>
<QUALITY>Low</QUALITY>
<ALIAS_NAME>Abu-Yunus</ALIAS_NAME>
</INDIVIDUAL_ALIAS>
<INDIVIDUAL_ADDRESS>
<COUNTRY/>
</INDIVIDUAL_ADDRESS>
<INDIVIDUAL_DATE_OF_BIRTH>
<TYPE_OF_DATE>EXACT</TYPE_OF_DATE>
<YEAR>1984</YEAR>
</INDIVIDUAL_DATE_OF_BIRTH>
<INDIVIDUAL_PLACE_OF_BIRTH/>
<INDIVIDUAL_DOCUMENT/>
<SORT_KEY/>
<SORT_KEY_LAST_MOD/>
</INDIVIDUAL>
</CONSOLIDATED_LIST>
所需的輸出來解析下面的XML文件如下:
---------------------------------------------------------------------------
DATAID | FIRST_NAME | SECOND_NAME | THIRD_NAME | FOURTH_NAME | ALIAS_NAME | QUALITY
---------------------------------------------------------------------------
6908434 | ABD | AL-KHALIQ | AL-HOUTHI | NA | Abd-al-Khaliq al-Huthi | Good
-----------------------------------------------------------------------------
6908434 | ABD | AL-KHALIQ | AL-HOUTHI | NA | Abd-al-Khaliq Badr-al-Din al Huthi | Good
-----------------------------------------------------------------------------
6908434 | ABD | AL-KHALIQ | AL-HOUTHI | NA | ‘Abd al-Khaliq Badr al-Din al-Huthi | Good
-----------------------------------------------------------------------------
6908434 | ABD | AL-KHALIQ | AL-HOUTHI | NA | Abd al-Khaliq al-Huthi | Good
-----------------------------------------------------------------------------
6908434 | ABD | AL-KHALIQ | AL-HOUTHI | NA | Abu-Yunus | Low
-----------------------------------------------------------------------------
一個問題是某些條目沒有THIRD_NAME和FOURTH_NAME。任何幫助表示讚賞,謝謝。
必須使用下面的代碼嘗試:
result <- do.call(rbind,lapply(individuals,function(individual){
DATAID <- xmlValue(individual["DATAID"][[1]])
FIRST_NAME <- xmlValue(individual["FIRST_NAME"][[1]])
SECOND_NAME <- xmlValue(individual["SECOND_NAME"][[1]])
THIRD_NAME <- xmlValue(individual["THIRD_NAME"][[1]])
FOURTH_NAME <- xmlValue(individual["FOURTH_NAME"][[1]])
c(DATAID = DATAID, FIRST_NAME = FIRST_NAME)
}))
result <- data.frame(result)
但隨着要麼沒有third_name,fourth_name,也不能確定如何獲得alias_names失敗。
感謝芭菲(上述轉變,有兩種方法後),這是很大的幫助。 –