與Jsoup

我想從以下頁面信息中提取文本：http://fantasynews.cbssports.com/fantasyfootball/players/updates/187741 與Jsoup

我需要單獨字符串爲其中的每一項：

新聞標題
新聞
分析

現在我能夠從整個表中獲取信息：

doc = Jsoup.connect("http://fantasynews.cbssports.com/fantasyfootball/players/updates/" + playerId).timeout(30000).get(); 
Element title = doc.select("[id*=newsPage1]").first();

但是這樣做的結果是所有的文章一起運行。

誰能指教？

感謝喬希

來源

2013-05-07 Josh

您需要使用更復雜的CSS選擇器。也許是這樣的：

public static void main(String[] args) { 
    Pattern pat = Pattern.compile("(.*)News\\:\\p{Zs}(.*)Analysis\\:\\p{Zs}(.*)", Pattern.UNICODE_CASE); 
    Document doc = null; 
    try { 
    doc = Jsoup.connect("http://fantasynews.cbssports.com/fantasyfootball/players/updates/187741").userAgent("Mozilla").get(); 
    } catch (IOException e1) { 
    e1.printStackTrace(); 
    System.exit(0); 
    }; 

    Elements titles = doc.select("table h3"); 
    for (Element title : titles){ 
    Element td = title.parent(); 
    String innerTxt = td.text(); 
    Matcher mat = pat.matcher(innerTxt); 
    if (mat.find()){ 
     System.out.println("titel = " + mat.group(1)); 
     System.out.println("news = " + mat.group(2)); 
     System.out.println("analysis = " + mat.group(3)); 
    } 
    } 
}

我建議你看看CSS選擇器和JSoup documentation。

來源

2013-05-07 09:59:02 luksch

你能幫忙搜索功能缺失的代碼嗎？我無法在您發佈的鏈接中找到詳細信息。 – Josh 2013-05-07 15:40:55

我改變了我的例子。也許那個作品... – luksch 2013-05-07 18:07:56

很好的答案！謝謝。是否可以使用此解決方案輕鬆獲取每個條目的日期？另外，你可以爲自由職業者Jsoup提供幫助嗎？ – Josh 2013-05-08 10:14:02

回答

相關問題