使用Jsoup獲取網頁元素

我正在嘗試使用Jsoup從名爲morningstar的網站獲取股票數據。我看過其他論壇，一直沒有找到問題所在。使用Jsoup獲取網頁元素

我試圖做更先進的數據報廢，但我似乎無法得到的價格。我要麼返回null或根本沒有。

我知道其他語言和API，但我想使用Jsoup，因爲它似乎很有能力。

這是我到目前爲止有：

public class Scrape { 
    public static void main(String[] args){ 
     String URL = "http://www.morningstar.com/stocks/xnas/aapl/quote.html"; 
     Document d = new Document(URL); 
     try{ 
      d = Jsoup.connect(URL).get(); 
     }catch(IOException e){ 
      e.printStackTrace(); 
     } 
     Element stuff = d.select("#idPrice gr_text_bigprice").first(); 
     System.out.println("Price of AAPL: " + stuff); 
     } 
}

任何幫助，將不勝感激。

來源

2016-06-07 BillytheKid

你肯定不是由JavaScript動態生成的數據？ –

由於是動態創建的內容使用JavaScript，你可以使用模擬瀏覽器一樣的HtmlUnit https://sourceforge.net/projects/htmlunit/

關於價格等嵌入一個iFrame的信息，所以我們首先抓住（也動態地構建）之後，iFrame鏈接並解析iFrame。

java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 

final WebClient webClient = new WebClient(BrowserVersion.CHROME); 
webClient.getOptions().setCssEnabled(false); 
webClient.getOptions().setJavaScriptEnabled(true); 
webClient.getOptions().setThrowExceptionOnScriptError(false); 
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); 
webClient.getOptions().setTimeout(1000); 

HtmlPage page = webClient.getPage("http://www.morningstar.com/stocks/xnas/aapl/quote.html"); 

Document doc = Jsoup.parse(page.asXml()); 

String title = doc.select(".r_title").select("h1").text(); 

String iFramePath = "http:" + doc.select("#quote_quicktake").select("iframe").attr("src"); 

page = webClient.getPage(iFramePath); 

doc = Jsoup.parse(page.asXml()); 

System.out.println(title + " | Last Price [$]: " + doc.select("#last-price-value").text());

打印：

Apple Inc | Last Price [$]: 98.63

中的HtmlUnit JavaScript引擎是相當緩慢的（上面的代碼把我的機器上大約18秒）的，所以它可能是尋找到其他的JavaScript引擎/無頭的瀏覽器有用（phantomJs等;檢查此選項列表：https://github.com/dhamaniasad/HeadlessBrowsers）以提高性能，但HtmlUnit完成工作。您也可以嘗試用自定義WebConnectionWrapper過濾不相關的腳本，圖片等：

http://htmlunit.10904.n7.nabble.com/load-parse-speedup-tp22735p22738.html

來源

2016-06-07 10:17:41

使用Jsoup獲取網頁元素

回答

相關問題