獲取元素之外的文本

我使用簡單的html dom來刮取網站。我遇到的問題是有文本以外的任何特定元素。它似乎在裏面的唯一元素是<div id="content">。獲取元素之外的文本

<div id="content"> 
    <div class="image-wrap"></div> 
    <div class="gallery-container"></div> 
    <h3 class="name">Here is the Heading</h3> 

    All the text I want is located here !!! 

    <p> </p> 
    <div class="snapshot"></div> 
</div>

我想在網站站長搞砸和文本實際上應該是<p>標籤內。

我用下面這段代碼試過，但它只是不會檢索文本：

$t = $scrape->find("div#content text",0); 
    if ($t != null){ 
     $text = trim($t->plaintext); 
    }

我還是新手，還在學習。任何人都可以幫忙嗎？

來源

2014-08-29 trademark

你快到了......使用測試循環顯示節點的內容並找到想要的文本的索引。例如：

// Find all texts 
$texts = $html->find('div#content text'); 

foreach ($texts as $key => $txt) { 
    // Display text and the parent's tag name 
    echo "<br/>TEXT $key is ", $txt->plaintext, " -- in TAG ", $txt->parent()->tag ; 
}

你會發現，你應該使用索引4代替0：

$scrape->find("div#content text",4);

如果你的文字不必須總是相同的指標，但你知道的例子，它遵循h3標題，然後你可以使用類似：

foreach ($texts as $key => $txt) { 
    // Locate the h3 heading 
    if ($txt->parent()->tag == 'h3') { 
     // Grab the next index content from $texts 
     echo $texts[$key+1]->plaintext; 
     // Stop 
     break; 
    } 
}

來源

2014-08-29 06:08:17 Enissay

謝謝Enissay，一旦我計算出你的代碼在做什麼，這一切是有意義的。完美運作，是解決問題的好方法。非常感謝。 – trademark 2014-08-30 06:18:21

實際上，這在我測試過的列表頁面上完美地工作，但是網站上的每個列表頁面都有我想要分配給不同「文本編號」的文本。它因頁面而異。有什麼辦法可以解決這個問題嗎？ – trademark 2014-08-30 06:26:15

@trademark檢查我編輯的答案... – Enissay 2014-08-30 10:10:43

獲取元素之外的文本

回答

相關問題