從網頁上刮信息

如何從此html頁面獲取信息（http://linkWeb.com，標題和http://link.pdf）？從網頁上刮信息

<div class="title-download"> 
    <div id="01divTitle" class="title"> 
     <h3> 
      <a id="01Title" onmousedown="" href="http://linkWeb.com">Titles</a> 
      <span id="01LbCitation" class="citation">(<a id="01Citation" href="http://citation.com">Citations</a>)</span></h3> 
    </div> 
    <div id="01downloadDiv" class="download"> 
     <a id="01_downloadIcon" title="http://link.pdf" onmousedown="" target=""><img id="ctl01_icon" class="small-icon";" /></a> 
    </div> 
</div>

我試過但它只返回標題。我沒有意識到之前的simple_tml_dom。請幫幫我。謝謝:)

<?php 

include 'simple_html_dom.php'; 
set_time_limit(0); 

$url ='http://libra.msra.cn/Search?query=data%20mining&s=0'; 
$html = file_get_html($url) or die ('invalid url'); 
foreach($html->find('div[class=title-download]') as $webLink){ 
    echo $webLink->plaintext.'<br>'; 
    echo $webLink->href.'<br>'; 
} 

foreach($html->find('div[class=download]') as $Link2){ 
    echo $webLink2->href.'<br>';  
} 

?>

來源

2012-07-21 bruine

隨着你的foreach第二次給出的答案尋找一個http：//link.pdf，它是用屬性「title」指定的，而不是用「href」指定的...... – zigomir 2012-07-21 02:06:11

@zigomir哦，是的！感謝您的更正！ :) – bruine 2012-07-22 01:24:30

廢料的標題和URL使用此代碼：

foreach($html->find('span[class=citation]') as $link){ 
    $link = $link->prev_sibling(); 
    echo $link->plaintext.'<br>'; 
    echo $link->href.'<br>'; 
}

和報廢類的下載網址，使用@zigomir :)

foreach($html->find('.download a') as $link){ 
    echo $link->title.'<br>';  
}

來源

2012-07-22 01:20:43 bruine

我認爲你需要選擇裏面有級冠軍下載DIV的一個元素。至少有資料稱，它選擇如jQuery（http://simplehtmldom.sourceforge.net/）

試試這樣說：

$html = file_get_html($url) or die ('invalid url'); 
foreach($html->find('.title a') as $webLink){ 
    echo $webLink->plaintext.'<br>'; 
    echo $webLink->href.'<br>'; 
} 

foreach($html->find('.download a') as $link){ 
    echo $link->title.'<br>';  
}

來源

2012-07-21 02:04:19 zigomir

問題是該html頁面的內容在每個結果中都有不同的id。例如，第二個結果必須是'id =「02Title'和'id = 02_downloadIcon' – bruine 2012-07-21 02:10:29

，那麼你應該按照類來選擇：'.title a'。我也編輯了我的答案。 – zigomir 2012-07-21 14:20:42

哦，是的，謝謝你呢！但是，它也會把引用記錄下來，我只需要取消標題和URL，我已經找到了取消標題和URL的方法，查看我的答案。謝謝分享！讓我明白如何訪問HTML元素:) – bruine 2012-07-22 01:16:36

使用的libxml解析HTML和使用的XPath指定的元素或元素屬性你想要的。

來源

2012-07-21 14:33:18 svth

從網頁上刮信息

回答

相關問題