2011-01-11 92 views
0
HTML標籤

其實我已經看到了這個問題頗有幾分這裏,但他們都不是正是我想要的......可以說我有下面這句話:正則表達式,避免PHP

Line 1 - This is a TEST phrase. 
Line 2 - This is a <img src="TEST" /> image. 
Line 3 - This is a <a href="somelink/TEST">TEST</a> link. 

好,簡單的權利?我想下面的代碼:

$linkPin = '#(\b)TEST(\b)(?![^<]*>)#i'; 
$linkRpl = '$1<a href="newurl">TEST</a>$2'; 

$html = preg_replace($linkPin, $linkRpl, $html); 

正如你所看到的,它需要的單詞測試,並與鏈接到測試替換它。我現在正在使用的正則表達式很好地避免了替換第2行中的TEST,它還避免了替換第3行href中的TEST。但是,它仍然替換了第3行中標記中封裝的文本,最終我搭配:

Line 1 - This is a <a href="newurl">TEST</a> phrase. 
Line 2 - This is a <img src="TEST" /> image. 
Line 3 - This is a <a href="somelink/TEST"><a href="newurl">TEST</a></a> link. 

這我不希望因爲它創造了在第3行我想不僅忽略標籤內的比賽糟糕的代碼,也由他們封裝。 (記得把音符/>在第2行)

+0

[最佳解析HTML方法]的可能重複(http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon 2011-01-11 14:54:25

+0

可能的重複[如何替換文本URL並排除HTML標籤中的URL?](http://stackoverflow.com/questions/4003031/how-to-replace-text-urls-and-exclude-urls-in-html-tags) – Gordon 2011-01-11 14:56:20

+1

[RegEx match打開標籤除XHTML獨立標籤](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – 2011-01-11 14:58:54

回答

0

好吧......我想,我想出了一個更好的解決辦法...

$noMatch = '(</a>|</h\d+>)'; 

$linkUrl = 'http://www.test.com/test/'.$link['page_slug']; 
$linkPin = '#(?!(?:[^<]+>|[^>]+'.$noMatch.'))\b'.preg_quote($link['page_name']).'\b#i'; 
$linkRpl = '<a href="'.$linkUrl.'">'.$link['page_name'].'</a>'; 

$page['HTML'] = preg_replace($linkPin, $linkRpl, $page['HTML']); 

有了這個代碼,它不會內<a>標籤和標籤<h#>處理任何文本。我想,我想添加的任何新的排除,只需要添加到$ noMatch。

我在這個方法錯了嗎?

1

老實說,我會用的DomDocument和XPath這樣做:

//First, create a simple html string around the text. 
$html = '<html><body><div id="#content">'.$text.'</div></body></html>'; 

$dom = new DomDocument(); 
$dom->loadHtml($html); 
$xpath = new DomXpath($dom); 

$query = '//*[not(name() = "a") and contains(., "TEST")]'; 
$nodes = $xpath->query($query); 

//Force it to an array to break the reference so iterating works properly 
$nodes = iterator_to_array($nodes); 
$replaceNode = function ($node) { 
    $text = $node->wholeText; 
    $text = str_replace('TEST', '<a href="TEST">TEST</a>', ''); 
    $fragment = $node->ownerDocument->createDocumentFragment(); 
    $fragment->appendXML($text); 
    $node->parentNode->replaceChild($fragment, $node); 
} 

foreach ($nodes as $node) { 
    if ($node instanceof DomText) { 
     $replaceNode($node, 'TEST'); 
    } else { 
     foreach ($node->childNodes as $child) { 
      if ($child instanceof DomText) { 
       $replaceNode($node, 'TEST'); 
      } 
     } 
    } 
} 

這應該爲你工作,因爲它忽略了a內部的所有文本元素,並且僅替換匹配標籤內部的文本。