xml
  • r
  • xml-parsing
  • 2013-04-04 41 views 5 likes 
    5

    比方說,我用下面的表達R,右的XPath表達式中使用時,XML和xpathSApply

    library(XML) 
    url.df_1 = htmlTreeParse("http://www.appannie.com/app/android/com.king.candycrushsaga/", useInternalNodes = T) 
    

    如果我運行下面的代碼解析了一個網站,

    xpathSApply(url.df_1, "//div[@class='app_content_section']/h3", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]])) 
    

    我會得到如下 -

    [1] "Description"      "What's new"      
    [3] "Permissions"      "More Apps by King.com All Apps »" 
    [5] "Customers Also Viewed"   "Customers Also Installed"  
    

    現在,我感興趣的只是「客戶還安裝」部分。但是,當我運行下面的代碼,

    xpathSApply(url.df_1, "//div[@class='app_content_section']/ul/li/a", function(x) c(xmlValue(x), xmlAttrs(x)[["href"]])) 
    

    它吐出全部列入「由King.com所有應用更多應用程序的」應用程序「客戶還看」和「客戶還安裝」。

    所以,我想,

    xpathSApply(url.df_1, "//div[h3='Customers Also Installed']」, function(x) c(xmlValue(x), xmlAttrs(x)[["href"]])) 
    

    但這並沒有工作。所以我試了

    xpathSApply(url.df_1, "//div[contains(.,'Customers Also Installed')]",xmlValue) 
    

    但是這也行不通。 (輸出應該像下面這樣)

    [,1]             
    [1,] "Christmas Candy Free\n Daniel Development\n " 
    [2,] "/app/android/xmas.candy.free/"      
    [,2]           
    [1,] "Jewel Candy Maker\n Nutty Apps\n "  
    [2,] "/app/android/com.candy.maker.jewel.nuttyapps/" 
    [,3]          
    [1,] "Pogz 2\n Terry Paton\n "   
    [2,] "/app/android/com.terrypaton.unity.pogz2/" 
    

    任何指導將非常感謝!

    +0

    +1!很好的問題。可重現的,你展示了你到目前爲止嘗試過的東西。 – agstudy 2013-04-04 08:47:40

    回答

    5

    這裏是一個選項(你真的很接近):

    xpathSApply(url.df_1,"//div[contains(.,'Customers Also Installed')]/*/li/a",xmlGetAttr,'href') 
    
    [1] "/app/android/xmas.candy.free/"     
    [2] "/app/android/com.candy.maker.jewel.nuttyapps/" 
    [3] "/app/android/com.terrypaton.unity.pogz2/" 
    
    +0

    非常感謝!現在我知道它是如何工作的! – user1486507 2013-04-04 23:10:48

    相關問題