如何使用Jsoup從鏈接標記中檢索URL

<article itemprop="articleBody"> 
    <p channel="wp.com" class="interstitial-link"> 
    <i> 
     [<a href="www.URL.com" shape="rect">Link Text</a>] 
    </i> 
    </p> 
<article>

如何從此HTML文檔中檢索帶有Jsoup的URL和鏈接文本？我希望它看起來像這樣如何使用Jsoup從鏈接標記中檢索URL

「鏈接文本[URL]」

編輯：我只想檢索內

<article itemprop="articleBody"> ... <article>

不是整個頁面的鏈接。另外，我想要所有的鏈接，而不僅僅是一個。

來源

2016-08-03 Ahmed Ahmed

您是否嘗試過使用選擇HTTPS ：//jsoup.org/cookbook/extracti NG-數據/選擇的語法？ – Pshemo

是的，那是我遇到的麻煩。特別是使用CSS選擇器。 –

你可以發佈你的嘗試嗎？我們大多數人訪問Stack Overflow來幫助其他人修正他們的代碼，而不是從頭開始爲他們編寫代碼，所以通過張貼[你有什麼試過]（http://mattgemmell.com/what-have-you-tried/）你正在增加您有機會獲得體面的回答，並解釋您在創建解決方案時所遇到的問題。 – Pshemo

// connect to URL and retrieve source code as document 
    Document doc = Jsoup.connect(url).get(); 

    // find the link element in the article 
    Element link = doc 
      .select("article[itemprop=articleBody] p.interstitial-link i a") 
      .first(); 

    // extract the link text 
    String linkText = link.ownText(); 

    // extract the full url of the href 
    // use this over link.attr("href") to avoid relative url 
    String linkURL = link.absUrl("href"); 


    // display 
    System.out.println(
      String.format(
        "%s[%s]", 
        linkText, 
        linkURL));

瞭解更多關於CSS Selectors

你也可以遍歷文章這樣在每一個環節：

for (Element link : doc.select("article[itemprop=articleBody] a")) { 
     String linkText = link.ownText(); 
     String linkURL = link.absUrl("href"); 
     System.out.println(
       String.format(
         "%s[%s]", 
         linkText, 
         linkURL)); 
    }

輸出

Link Text[http://www.URL.com]

來源

2016-08-03 16:54:46

不確定爲什麼你的第一個解決方案給出了空指針錯誤。然而，你的第二個解決方案完美工作非常感謝。 –

如何使用Jsoup從鏈接標記中檢索URL

回答

相關問題