2016-05-31 74 views
0

我對R來說比較新,並且正在嘗試使用XPath讀取XML文件並將其轉換爲R中的數據幀。我已經找到了一個解決方案,將文件轉換成我將能夠處理它的列表。但是,我需要我的程序運行速度相對較快。 我已經在w3school.com上檢查了這個教程(http://www.w3schools.com/xsl/xpath_nodes.asp)on XPath,但是他們沒有解釋我在XML文件中找到的符號 我想創建一個包含不同客戶及其屬性的數據框文件的開始不需要在我的計算XML節點表示法(R中的XPath到數據框)

下面是該文件的摘錄:?

$config 
<config> 
    <competition id="0" name="0" pomId="1.3.1-SNAPSHOT" timeslotLength="60" bootstrapTimeslotCount="336" bootstrapDiscardedTimeslots="24" timeslotsOpen="24" deactivateTimeslotsAhead="1" minimumOrderQuantity="0.01" timezoneOffset="-6" latitude="45" simulationRate="720" simulationModulo="3600000"> 
<description/> 
<simulationBaseTime> 
    <iMillis>1255132800000</iMillis> 
</simulationBaseTime> 
<broker>default broker</broker> 
<customer id="4097" name="HighIncome-2_8" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 
<customer id="4100" name="HighIncome-2_9" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
<customer id="4103" name="HighIncome-2_10" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
<customer id="4106" name="HighIncome-2_11" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 

如何我指的是每個客戶是他們的屬性點,屬性

回答

0

在XML中,?有兩種結構類型保存值:

  1. 元件(也被稱爲節點或標籤),封入與角形托架,其值被保持在開口<element></element>
  2. 屬性與@它的值被賦予等於操作員前綴之間

針對您的特殊XML,客戶ID人口元素powerTypecustomerClasscontrollableKWupRegulationKWdownRegulationKWstorageCapacitymultiContracting,和canNegotiate作爲屬性。

就R XML模塊,以從xpathSApply()提取的一組值,其XPath 1.0中的功能,就必須指定fun參數作爲xmlValue爲元素值和xmlAttrs屬性值。從那裏你可以操縱輸出的列表或矩陣進行數據幀遷移。特別爲您的需要,您可以簡單地將數據提取到矩陣中並轉換爲最終的數據幀。在XPath表達式中使用double forward slash可在文檔中的任意位置查找特定位置,這裏是客戶。

library(XML) 
xmlstr <- '<config> 
      <competition id="0" name="0" pomId="1.3.1-SNAPSHOT" timeslotLength="60" bootstrapTimeslotCount="336" bootstrapDiscardedTimeslots="24" timeslotsOpen="24" deactivateTimeslotsAhead="1" minimumOrderQuantity="0.01" timezoneOffset="-6" latitude="45" simulationRate="720" simulationModulo="3600000"> 
       <description/> 
       <simulationBaseTime> 
        <iMillis>1255132800000</iMillis> 
       </simulationBaseTime> 
       <broker>default broker</broker> 
       <customer id="4097" name="HighIncome-2_8" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 
       <customer id="4100" name="HighIncome-2_9" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
       <customer id="4103" name="HighIncome-2_10" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="60.0" multiContracting="false" canNegotiate="false"/> 
       <customer id="4106" name="HighIncome-2_11" population="1" powerType="ELECTRIC_VEHICLE" customerClass="SMALL" controllableKW="-3.3" upRegulationKW="-3.3" downRegulationKW="3.3" storageCapacity="85.0" multiContracting="false" canNegotiate="false"/> 
      </competition> 
      </config>'  
xml <- xmlParse(xmlstr) 

# MATRIX OF CUSTOMER ATTRIBS 
customerAttribs <- xpathSApply(doc=xml, path="//customer", xmlAttrs) 

# TRANSPOSE TO DATA FRAME 
df <- data.frame(t(customerAttribs)) 

#  id   name population  powerType customerClass controllableKW \ 
# 1 4097 HighIncome-2_8   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# 2 4100 HighIncome-2_9   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# 3 4103 HighIncome-2_10   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# 4 4106 HighIncome-2_11   1 ELECTRIC_VEHICLE   SMALL   -3.3 
# upRegulationKW downRegulationKW storageCapacity multiContracting canNegotiate 
# 1   -3.3    3.3   85.0   false  false 
# 2   -3.3    3.3   60.0   false  false 
# 3   -3.3    3.3   60.0   false  false 
# 4   -3.3    3.3   85.0   false  false