2011-05-22 67 views
1

我有3個消息塊。PHP正則表達式

例子:

<!-- message --> 
    <div> 
     Just the text. 
    </div> 
<!--/message --> 

<!-- message --> 
    <div> 
     <div style="margin-left: 20px; margin-top:5px; "> 
      <div class="smallfont">Quote:</div> 
     </div> 
     <div style="margin-right: 20px; margin-left: 20px; padding: 10px;"> 
      Message from <strong>Nickname</strong> &nbsp; 
       <div style="font-style:italic">Hello. It's a quote</div> 
       <else /></if> 
     </div> 
     <br /><br /> 
     It's the simple text 
    </div> 
<!--/message --> 

<!-- message --> 
    <div> 
     Text<br /> 
     <div style="margin:20px; margin-top:5px; background-color: #30333D"> 
      <div class="smallfont" style="margin-bottom:2px">PHP code:</div> 
      <div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;"> 
       <code style="white-space:nowrap"> 
        <div dir="ltr" style="text-align:left"> 
         <!-- php buffer start --> 
          <code> 
           LALALA PHP CODE 
          </code> 
         <!-- php buffer end --> 
        </div> 
       </code> 
      </div> 
     </div><br /> 
     <br /> 
     More text 
    </div> 
<!--/message --> 

我試圖讓這些模塊正則表達式,但不起作用。

preg_match('#<!-- message -->(?P<text>.*?)</div>.*?<!--/message -->#is', $str, $s); 

它僅適用於第一個塊..

如何讓這個正則表達式檢查是否有一個消息或PHP代碼報價?

(?P<text>.*?) for text 

(?P<phpcode>.*?) for php code 

(?P<quotenickname>.*?) for quoted nickname 

(?P<quotemessage>.*?) for quote message 

等..

非常感謝你!!!!

變動表onteria_

<!-- message --> 
    <div> 
     Just the text. <b>bold text</b><br/> 
     <a href="link">link</a>, <s><i>test</i></s>   
    </div> 
<!--/message --> 

輸出:

Just the text 
, 

需要什麼來解決這個問題的結論是,以 「A」, 「B」, 「S」 一起,「我「等。 如何確保html不被刪除? 謝謝

+2

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – geoffspear 2011-05-22 21:11:11

+0

*(相關)* [最佳方法解析HTML]( http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon 2011-05-22 21:16:10

+0

那裏,做了與dom。 – 2011-05-22 23:36:55

回答

3

注意到那些關於不使用正則表達式的迴應?這是爲什麼?那是因爲HTML代表結構。認爲說實話,HTML代碼過度使用div而不是使用語義標記,但我打算用DOM功能解析它。那麼,這裏是我使用的樣本HTML:

<html> 
<body> 
<!-- message --> 
    <div> 
     Just the text. 
    </div> 
<!--/message --> 

<!-- message --> 
    <div> 
     <div style="margin-left: 20px; margin-top:5px; "> 
      <div class="smallfont">Quote:</div> 
     </div> 
     <div style="margin-right: 20px; margin-left: 20px; padding: 10px;"> 
      Message from <strong>Nickname</strong> &nbsp; 
       <div style="font-style:italic">Hello. It's a quote</div> 
     </div> 
     <br /><br /> 
     It's the simple text 
    </div> 
<!--/message --> 

<!-- message --> 
    <div> 
     Text<br /> 
     <div style="margin:20px; margin-top:5px; background-color: #30333D"> 
      <div class="smallfont" style="margin-bottom:2px">PHP code:</div> 
      <div class="alt2" style="margin:0px; padding:6px; border:1px inset; width:640px; height:482px; overflow:auto; background-color:#FFFACA;"> 
       <code style="white-space:nowrap"> 
        <div dir="ltr" style="text-align:left"> 
         <!-- php buffer start --> 
          <code> 
           LALALA PHP CODE 
          </code> 
         <!-- php buffer end --> 
        </div> 
       </code> 
      </div> 
     </div><br /> 
     <br /> 
     More text 
    </div> 
<!--/message --> 
</body> 
</html> 

現在的全碼:

$doc = new DOMDocument(); 
$doc->loadHTMLFile('test.html'); 


// These just make the code nicer 
// We could just inline them if we wanted to 
// ----------- Helper Functions ------------ 
function HasQuote($part, $xpath) { 
    // check the div and see if it contains "Quote:" inside 
    return $xpath->query("div[contains(.,'Quote:')]", $part)->length; 
} 

function HasPHPCode($part, $xpath) { 
    // check the div and see if it contains "PHP code:" inside 
    return $xpath->query("div[contains(.,'PHP code:')]", $part)->length; 
} 
// ----------- End Helper Functions ------------ 


// ----------- Parse Functions ------------ 
function ParseQuote($quote, $xpath) { 
    // The quote content is actually the next 
    // next div over. Man this markup is weird. 
    $quote = $quote->nextSibling->nextSibling; 

    $quote_info = array('type' => 'quote'); 

    $nickname = $xpath->query("strong", $quote); 
    if($nickname->length) { 
    $quote_info['nickname'] = $nickname->item(0)->nodeValue; 
    } 

    $quote_text = $xpath->query("div", $quote); 
    if($quote_text->length) { 
    $quote_info['quote_text'] = trim($quote_text->item(0)->nodeValue); 
    } 

    return $quote_info; 
} 

function ParseCode($code, $xpath) { 
    $code_info = array('type' => 'code'); 

    // This matches the path to get down to inner most code element 
    $code_text = $xpath->query("//div/code/div/code", $code); 
    if($code_text->length) { 
    $code_info['code_text'] = trim($code_text->item(0)->nodeValue); 
    } 

    return $code_info; 
} 

// ----------- End Parser Functions ------------ 

function GetMessages($message, $xpath) { 

    $message_contents = array(); 

    foreach($message->childNodes as $child) { 

    // So inside of a message if we hit a div 
    // We either have a Quote or PHP code, check which 
    if(strtolower($child->nodeName) == 'div') { 
     if(HasQuote($child, $xpath)) { 
    $quote = ParseQuote($child, $xpath); 
    if($quote['quote_text']) { 
     $message_contents[] = $quote; 
    } 
     } 
     else if(HasPHPCode($child, $xpath)) { 
    $phpcode = ParseCode($child, $xpath); 
    if($phpcode['code_text']) { 
     $message_contents[] = $phpcode; 
    } 
     } 
    } 
    // Otherwise check if we've found some pretty text 
    else if ($child->nodeType == XML_TEXT_NODE) { 
     // This might be just whitespace, so check that it's not empty 
     $text = trim($child->nodeValue); 
     if($text) { 
    $message_contents[] = array('type' => 'text', 'text' => trim($child->nodeValue)); 
     } 
    } 

    } 

    return $message_contents; 
} 

$xpath = new DOMXpath($doc); 
// We need to get the toplevel divs, which 
// are the messages 
$toplevel_divs = $xpath->query("//body/div"); 

$messages = array(); 
foreach($toplevel_divs as $toplevel_div) { 
    $messages[] = GetMessages($toplevel_div, $xpath); 
} 

現在讓我們看看$messages樣子:

Array 
(
    [0] => Array 
     (
      [0] => Array 
       (
        [type] => text 
        [text] => Just the text. 
       ) 

     ) 

    [1] => Array 
     (
      [0] => Array 
       (
        [type] => quote 
        [nickname] => Nickname 
        [quote_text] => Hello. It's a quote 
       ) 

      [1] => Array 
       (
        [type] => text 
        [text] => It's the simple text 
       ) 

     ) 

    [2] => Array 
     (
      [0] => Array 
       (
        [type] => text 
        [text] => Text 
       ) 

      [1] => Array 
       (
        [type] => code 
        [code_text] => LALALA PHP CODE 
       ) 

      [2] => Array 
       (
        [type] => text 
        [text] => More text 
       ) 

     ) 

) 

它是由消息分離然後進一步分成消息中的不同內容!現在我們甚至可以使用像這樣的基本打印功能:

foreach($messages as $message) { 
    echo "\n\n>>>>>> Message >>>>>>>\n"; 
    foreach($message as $content) { 
    if($content['type'] == 'text') { 
     echo "{$content['text']} "; 
    } 
    else if($content['type'] == 'quote') { 
     echo "\n\n======== Quote =========\n"; 
     echo "From: {$content['nickname']}\n\n"; 
     echo "{$content['quote_text']}\n"; 
     echo "=====================\n\n"; 
    } 
    else if($content['type'] == 'code') { 
     echo "\n\n======== Code =========\n"; 
     echo "{$content['code_text']}\n"; 
     echo "=====================\n\n"; 
    } 
    } 
} 

echo "\n"; 

我們得到這個!

>>>>>> Message >>>>>>> 
Just the text. 

>>>>>> Message >>>>>>> 


======== Quote ========= 
From: Nickname 

Hello. It's a quote 
===================== 

It's the simple text 

>>>>>> Message >>>>>>> 
Text 

======== Code ========= 
LALALA PHP CODE 
===================== 

More text 

由於DOM解析函數能夠理解結構,所以這一切都可以工作。

+0

非常感謝你! – 2011-05-26 16:47:02

+0

編輯第1條消息給你..幫我 – 2011-05-27 13:38:17