2012-06-20 32 views
0

我有一個涉及DOM的奇怪錯誤。我試圖迭代文檔中的每個href,並在必要時用絕對路徑替換它。問題是,在我使用$dom->setttribute()後,getAttribute返回更改後的值。然而,如果我saveHTML()或使用getElementsByTagName和getAttribute再次查詢標籤,則值已從http://example.com/path.php?ccc截斷爲http://example.com在PHP中使用DOM更改屬性無法正確保存

這裏是我的代碼:

<?php 
//include 'url_to_absolute.php'; 


function url_to_absolute($url, $href) { 
    return trim($url . $href); 
} 

$url = 'http://example.com'; 
//$url = $_GET["url"]; 
$ch = curl_init(); 
curl_setopt($ch,CURLOPT_URL, $url); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
$contents = curl_exec($ch); 
@curl_close(); 

$dom = new DOMDocument(); 
$dom->loadHTML($contents); 


//change the urls to absolute 
$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor) 
{ 
    $href = $anchor->getAttribute('href'); 
    $abs = url_to_absolute($url, $href); 
    $anchor->removeAttribute('href'); 
    $anchor->setAttribute('href', $abs); 

    //changed 
    $newhref = $anchor->getAttribute('href'); 
    echo "newhref = " . $newhref; //shows http://example.com/.... (good) 
} 


$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor) 
{ 
    echo "new2 = " . $anchor->getAttribute('href'); //returns http://example.com only 
} 

//print output 
echo @$dom->saveHTML(); 
?> 
+1

難道您發佈url_to_absolute()函數,請?而且,在你的代碼片段中,$ url變量不存在。 – pp19dd

+0

我的url_to_absolute函數我從這裏得到:http://nadeausoftware.com/articles/2008/05/php_tip_how_convert_relative_url_absolute_url#Code –

+0

我的$ url變量被設置得更早$ url = $ _GET ['url'] –

回答

0

嘗試這些捲曲選項+ curl_init($網址):

<?php 
//include 'url_to_absolute.php'; 
function url_to_absolute($url, $href){ 
    return trim($url . $href); 
} 

$url = 'http://example.com'; 
//$url = $_GET["url"]; 
$ch = curl_init($url); 
curl_setopt($ch,CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch,CURLOPT_FOLLOWLOCATION, TRUE); 
$contents = curl_exec($ch); 
curl_close(); 

$dom = new DOMDocument(); 
$dom->loadHTML($contents); 
//$dom->saveHTMLFile('dom_doc_test.html'); 


//change the urls to absolute 
$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor) 
{ 
    $href = $anchor->getAttribute('href'); 
    $abs = url_to_absolute($url, $href); 
    $anchor->removeAttribute('href'); 
    $anchor->setAttribute('href', $abs); 

    //changed 
    $newhref = $anchor->getAttribute('href') . '<br />'; 
    echo "newhref = " . $newhref; //shows http://example.com/.... (good) 
} 

$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor) 
{ 
    echo "new2 = " . $anchor->getAttribute('href') . '<br />'; //returns http://example.com only 
} 

//print output 
echo @$dom->saveHTML(); 
?> 
0

它應該是你的url_to_absolute功能的錯誤。我簡單url_to_absolute是:

function url_to_absolute($url, $href){ 
    return trim($url . $href); 
} 

$url = 'http://example.com'; 

$dom = new DOMDocument(); 
$dom->loadHTML('<html><body><a href="/path.html?q=hello&a=bye"></a><a href="/path2.html?before=34&after=44"></a></body></html>'); 

$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor){ 
    $href = $anchor->getAttribute('href'); 
    echo "href = " . $href . '<br />'; 
} 

echo '<br />'; 

$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor){ 
    $href = $anchor->getAttribute('href'); 
    $abs = url_to_absolute($url, $href); 
    $anchor->removeAttribute('href'); 
    $anchor->setAttribute('href', $abs); 

    $newhref = $anchor->getAttribute('href'); 
    echo "newhref = " . $newhref . '<br />'; 
} 

echo '<br />'; 

$anchors = $dom->getElementsByTagName('a'); 
foreach($anchors as $anchor){ 
    echo "new2 = " . $anchor->getAttribute('href') . '<br />'; 
} 

,其結果是:

href = /path.html?q=hello&a=bye 
href = /path2.html?before=34&after=44 

newhref = http://example.com/path.html?q=hello&a=bye 
newhref = http://example.com/path2.html?before=34&after=44 

new2 = http://example.com/path.html?q=hello&a=bye 
new2 = http://example.com/path2.html?before=34&after=44 
+0

我將我的代碼用你簡單的url_to_absolute函數。我仍然遇到同樣的問題。我編輯了我的代碼片段,以顯示我的代碼現在的樣子 –