2015-02-05 88 views
0

我想使用它們的類或xpathSapply從下面的html代碼中提取信息。使用R刮開HTML -xpathSApply

我想捕捉不同的信息作爲表,例如

  • 效果與5和全部評論人口的

    • 截斷一個爲

    不是列填充完整評論的列。


<div class="userPost"> 
<div class="postHeading clearfix"> 
    <div class="conditionInfo"> 
       Condition: Condition in which Stomach Acid is Pushed Into the Esophagus</div> 
    <div class="date">8/12/2014 12:27:53 PM</div> 
</div> 
<p class="reviewerInfo">Reviewer: Believer, 35-44 Female on Treatment for 2 to less than 5 years (Patient) </p> 
<div id="ctnStars"> 
    <div class="catRatings firstEl clearfix"> 
    <p class="category">Effectiveness</p> 
    <p class="inlineRating starRating"><span class="current-rating" style="width: 100%"> 
     Current Rating: 5</span></p> 
    </div> 
    <div class="catRatings clearfix"> 
    <p class="category">Ease of Use</p> 
    <p class="inlineRating starRating"><span class="current-rating" style="width: 100%"> 
     Current Rating: 5</span></p> 
    </div> 
    <div class="catRatings lastEl clearfix"> 
    <p class="category">Satisfaction</p> 
    <p class="inlineRating starRating"><span class="current-rating" style="width: 100%"> 
     Current Rating: 5</span></p> 
    </div> 
</div> 
<p id="comTrunc1" class="comment"><strong>Comment: </strong><br>Most excellent! I tried several different rx&#39;s to help with my acid problem and none were as effective as Nexium. After being on it for 3 months I stopped because that was how long my doc thought it would take to heal me. I stopped taking it and boom, the pain was back. Got back on Nexium and am staying on it. Such relief was unexpected.</p> 
<p id="comFull1" class="comment" style="display:none"><strong>Comment:</strong><br>Most excellent! I tried several different rx&#39;s to help with my acid problem and none were as effective as Nexium. After being on it for 3 months I stopped because that was how long my doc thought it would take to heal me. I stopped taking it and boom, the pain was back. Got back on Nexium and am staying on it. Such relief was unexpected.<br><a onclick="toggle('comTrunc1'); toggle('comFull1');return false;" href="#">Hide Full Comment</a></p> 
<div class="actionLinks clearfix"> 
    <p class="helpful">4 
         people 

       found this review helpful.<br> 
       Was this review helpful? <span id="513102_Vote"><a href="#" onclick="return FoundHelpFul('8cbc5bf1-4f86-48e4-ac0f-5b3085949a2a', 513102, true)">Yes</a> | <a href="#" onclick="return FoundHelpFul('8cbc5bf1-4f86-48e4-ac0f-5b3085949a2a', 513102, false)">No</a></span></p><a class="reportAbuse" href="#" onclick="showPopWin('ReportAbuse.aspx?reviewid=513102&amp;userid=8cbc5bf1-4f86-48e4-ac0f-5b3085949a2a',400,160,null, false); return false;">Report This Post</a></div> 
+0

嗨,歡迎來到Stack Overflow。所以:你試過了什麼代碼?它打破了嗎?如果是這樣 - 錯誤信息是什麼?在S/O,我們寧願幫助你與你一起去,而不是爲你寫,所以告訴我們你有什麼:) – 2015-02-16 03:22:04

回答

0

這是我不太清楚你在做什麼,但這裏是一個開始。如果這不是您想要的方向,請在嘗試按照這些行(幷包括您的代碼)之後編輯您的問題。假設「網址」是您提供的HTML代碼的網址,請嘗試如下所示:

library(xml) 
doc <- htmlTreeParse(url) # reads into the object doc the contents of the url 

data <- xpathSApply(doc, "//div[@id = 'ctnStars']//[[@class = 'category']", xmlValue, trim = TRUE) # to extract the value of that node ("Effectiveness")