2014-10-07 75 views
-1

我可以PHP代碼刮痧從谷歌搜索結果的標題和URL現在如何獲得描述刮谷歌頭版結果與PHP

$url = 'http://www.google.com/search?hl=en&safe=active&tbo=d&site=&source=hp&q=Beautiful+Bangladesh&oq=Beautiful+Bangladesh'; 
$html = file_get_html($url); 

$linkObjs = $html->find('h3.r a'); 
foreach ($linkObjs as $linkObj) { 
    $title = trim($linkObj->plaintext); 
    $link = trim($linkObj->href); 

    // if it is not a direct link but url reference found inside it, then extract 
    if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) { 
     $link = $matches[1]; 
    } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link 
     continue; 
    } 

    echo '<p>Title: ' . $title . '<br />'; 
    echo 'Link: ' . $link . '</p>'; 
} 

上面的代碼給出了下面的輸出現在

Title: Natural Beauties - Bangladesh Photo Gallery 
Link: http://www.photo.com.bd/Beauties/ 

我想要以下輸出

Title: Natural Beauties - Bangladesh Photo Gallery 
Link: http://www.photo.com.bd/Beauties/ 
description : photo.com.bd is a website for creative photographers from Bangladesh, mainly for amateur ... Natural-Beauty-of-Bangladesh_Flower · fishing on ... BEAUTY-4. 
+0

什麼是你解析的HTML?你做了什麼嘗試來解析它?這種嘗試以何種方式無法按預期工作? – David 2014-10-07 14:20:28

+0

simple_html_dom.php – ebrahim 2014-10-07 14:22:20

+0

文件名不會回答任何這些問題,也不會提供任何清晰的問題。目前,你似乎在尋求某人爲你做你的工作。這並不是Stack Overflow所做的。如果您正在尋找某人爲您的代碼添加功能,您應該聘請一名開發人員。如果*您*正在嘗試向您的代碼添加功能並被卡住,我們將很樂意提供幫助。但是你需要描述你所做的嘗試以及遇到的問題。 – David 2014-10-07 14:25:06

回答

4
include("simple_html_dom.php"); 

$in = "Beautiful Bangladesh"; 
$in = str_replace(' ','+',$in); // space is a + 
$url = 'http://www.google.com/search?hl=en&tbo=d&site=&source=hp&q='.$in.'&oq='.$in.''; 

print $url."<br>"; 

$html = file_get_html($url); 

$i=0; 
$linkObjs = $html->find('h3.r a'); 
foreach ($linkObjs as $linkObj) { 
    $title = trim($linkObj->plaintext); 
    $link = trim($linkObj->href); 

    // if it is not a direct link but url reference found inside it, then extract 
    if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) { 
     $link = $matches[1]; 
    } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link 
     continue; 
    } 

    $descr = $html->find('span.st',$i); // description is not a child element of H3 thereforce we use a counter and recheck. 
    $i++; 
    echo '<p>Title: ' . $title . '<br />'; 
    echo 'Link: ' . $link . '<br />'; 
    echo 'Description: ' . $descr . '</p>'; 
}