2017-10-18 87 views
0

我有這樣的HTML代碼:如何獲得HREF attributefrom html頁面

<tbody> 
<tr class=""> 
    <td align="right" csk="1">1</td> 
    <td align="left" ><img src="http://static.spref.com/olympics/images/flags/AFG.png" alt="AFG" title="Afghanistan" height=15 width=22>&nbsp;<a href="/olympics/countries/AFG/">Afghanistan</a></td> 
    <td align="right" >1936</td> 
    <td align="right" >2016</td> 
    <td align="right" >103</td> 
    <td align="right" >7</td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" >2</td> 
    <td align="right" >2</td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
</tr> 

我想一個陣列內把所有的HREF屬性。 我試圖用這個PHP代碼:如果我打印奧林匹克陣列我得到類似

<?php 

include_once ('/share/Multimedia/simple_html_dom.php'); 

$url = 'https://www.sports-reference.com/olympics/countries/'; 
$tagname_tbody = 'tbody'; 
$tagname_tr = 'td align="left"'; 


    $olympiad = array(); 
    $html = file_get_html($url,true); 

    foreach($html->find($tagname_tr) as $tag) { 
     $olympiad[] = trim($tag->innertext); 

    } 

事實上:

Array 
(
    [0] => 1 
    [1] => <img src="http://static.spref.com/olympics/images/flags/AFG.png" alt="AFG" title="Afghanistan" height=15 width=22>&nbsp;<a href="/olympics/countries/AFG/">Afghanistan</a> 
    [2] => 1936 
    [3] => 2016 
    [4] => 103 
    [5] => 7 
    [6] => 
    [7] => 
    [8] => 2 
    [9] => 2 
    [10] => 

爲什麼這種行爲?我還想獲得href屬性(在這裏是阿富汗)內的文本,可能在另一個數組中。 我不是一個PHP代碼專家,所以我向你求助。

+0

我認爲,而不是得到$標籤 - > innertertext,使用innertext查找標籤 – Krish

回答

0

可以加載這樣的HTML文件,這是個例,你能適應它:

<?php 
include_once ('/share/Multimedia/simple_html_dom.php'); 
$url = 'https://www.sports-reference.com/olympics/countries/'; 
$tagname_tbody = 'tbody'; 
$tagname_tr = 'td align="left"'; 
$olympiad = array(); 
$html = file_get_html($url,true); 
$doc = new DOMDocument(); 
$doc->loadHTML($html); 
// example 1: 
$elements = $doc->getElementsByTagName('*'); 
// example 2: 
$elements = $doc->getElementsByTagName('html'); 
// example 3: 
$elements = $doc->getElementsByTagName('body'); 
// example 4: 
$elements = $doc->getElementsByTagName('table'); 
// example 5: 
$elements = $doc->getElementsByTagName('div'); 

我希望它能幫助。

0

如果你想找到所有href屬性,我想你可以添加一個a$tagname_tr = 'td align="left"';

然後可以循環的結果,並獲得hrefinnertext

作爲一個例子,該值被存儲在2個陣列和HTML加載爲字符串:

include_once ('/share/Multimedia/simple_html_dom.php'); 

$source = <<<SOURCE 
<tbody> 
<tr class=""> 
    <td align="right" csk="1">1</td> 
    <td align="left" ><img src="http://static.spref.com/olympics/images/flags/AFG.png" alt="AFG" title="Afghanistan" height=15 width=22>&nbsp;<a href="/olympics/countries/AFG/">Afghanistan</a></td> 
    <td align="right" >1936</td> 
    <td align="right" >2016</td> 
    <td align="right" >103</td> 
    <td align="right" >7</td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" >2</td> 
    <td align="right" >2</td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
    <td align="right" ></td> 
</tr> 
SOURCE; 

$url = 'https://www.sports-reference.com/olympics/countries/'; 
$tagname_tbody = 'tbody'; 
$tagname_tr = 'td align="left" a'; 

$olympiad = array(); 
$elementText = array(); 
//$html = file_get_html($url,true); 
$html = str_get_html($source); 

foreach($html->find($tagname_tr) as $tag) { 
    $olympiad[] = $tag->href; 
    $elementText[] = $tag->innertext; 
} 

echo "<pre>"; 
print_r($olympiad); 
print_r($elementText); 

會導致:

Array 
(
    [0] => /olympics/countries/AFG/ 
) 
Array 
(
    [0] => Afghanistan 
)