使用rvest來刮取網站 - 選擇html節點？

我有一個關於我最新的背心刮的問題。使用rvest來刮取網站 - 選擇html節點？

我想湊這個頁面（和其他一些個股也一樣）： http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1

我需要市場資金，這是在第二行第一個框的列表。此清單應包含約50-100個股票。

我正在使用rvest。

library(rvest) 

html = read_html("http://www.finviz.com/quote.ashx?t=A") 

cast = html_nodes(html, "table-dark-row")

問題是，我無法繞過html_nodes。有關如何找到html_nodes的正確節點的任何想法？

我正在使用螢火蟲/火鳥檢查網頁。

來源

2016-11-08 Allessandro PT

不知道這是你想要的，因爲我找不到aprox列表。 50-100只股票。

但爲了什麼值得使用SelectorGadget我想出了這個節點.table-dark-row：nth-child（2）.snapshot-td2：nth-child（2），選擇Market Cap（first在本頁第二行http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1）。

> library(rvest) 
> 
> html = read_html("http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1") 
> 
> cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)") 
> cast 
{xml_nodeset (1)} 
[1] <td width="8%" class="snapshot-td2" align="left">\n <b>11.58B</b>\n</td> 
>

如果這不正是你想要的，只需使用SelectorGadget來找到你想要的。

希望這會有所幫助。

編輯：

這裏完整的解決方案：

library(rvest) 

html = read_html("http://www.finviz.com/quote.ashx?t=AA&ty=c&p=d&b=1") 

cast = html_nodes(html, ".table-dark-row:nth-child(2) .snapshot-td2:nth-child(2)") 

html_text(cast) %>% 
    gsub(pattern = "B", replacement = "") %>% 
    as.numeric()

來源

2016-11-09 08:56:36 elikesprogramming

這一個似乎很合法的。我需要弄清楚如何從字符串中提取數字。 –

在同一個'rvest'包中使用函數'html_text（）'。 'html_text（cast）'給你「12.76B」，然後，將它轉換爲數字，你需要擺脫B（我不知道它是什麼意思）。我編輯回答。檢查那裏的完整解決方案。 – elikesprogramming

使用rvest來刮取網站 - 選擇html節點？

回答

相關問題