2016-08-13 169 views
0
刮數據

我是從一個網站的源代碼是麻煩與PHP

view-source:http://www.pakdukaan.com/75-computer-cases 

我使用抽取數據的代碼報廢的數據是這樣的

<?php 
$html = file_get_contents('http://www.pakdukaan.com/75-computer-cases'); 

$pk_doc = new DOMDocument(); 
libxml_use_internal_errors(TRUE); 

if(!empty($html)){ 
$pk_doc->loadHTML($html); 
libxml_clear_errors(); 
$pk_xpath = new DOMXPath($pk_doc); 
$pk_list = array(); 
$pk_and_price = $pk_xpath->query('//div[@class="product_list list row "]'); 

if($pk_and_price->length > 0){ 

foreach($pk_and_price as $pat){ 
    $name = $pk_xpath->query('//h5[@class="name"]', $pat)->item(0)->nodeValue; 
    $pkmn_types = array(); 
    $price = $pk_xpath->query('//span[@class="price product-price"]', $pat) 

    foreach($types as $type){ 
     $pkmn_types[] = $type->nodeValue; 
    } 
    $pk_list[] = array('name' => $name, 'price' => $pkmn_price); 

} 
} 
} 

//output what we have 
echo "<pre>"; 
echo print_r($pk_list); 
echo "</pre>"; 
?> 

而是獲得所有病例名字,我得到一個和其他的東西是我得到的案件的所有價格的兩倍。

這是輸出

Array 
(
[0] => Array 
    (
     [name] => 

       Thermaltake V2 Plus + 350W Power Supply 


     [price] => Array 
      (
       [0] => 
         Rs. 4,099      
       [1] => 
         Rs. 4,099      
       [2] => 
         Rs. 5,899      
       [3] => 
         Rs. 5,899      
       [4] => 
         Rs. 8,499      
       [5] => 
         Rs. 8,499      
       [6] => 
         Rs. 9,499      
       [7] => 
         Rs. 9,499      
       [8] => 
         Rs. 10,350      
       [9] => 
         Rs. 10,350      
       [10] => 
         Rs. 12,999      
       [11] => 
         Rs. 12,999      
       [12] => 
         Rs. 17,799      
       [13] => 
         Rs. 17,799      
       [14] => 
         Rs. 16,199      
       [15] => 
         Rs. 16,199      
       [16] => 
         Rs. 17,299      
       [17] => 
         Rs. 17,299      
       [18] => 
         Rs. 16,500      
       [19] => 
         Rs. 16,500      
       [20] => 
         Rs. 5,899      
       [21] => 
         Rs. 5,899      
       [22] => 
         Rs. 8,399      
       [23] => 
         Rs. 8,399      
       [24] => 
         Rs. 4,999      
       [25] => 
         Rs. 4,999      
       [26] => 
         Rs. 7,599      
       [27] => 
         Rs. 7,599      
       [28] => 
         Rs. 9,999      
       [29] => 
         Rs. 9,999      
      ) 
    ) 
) 
1 

誰能幫出了問題?我已經嘗試了很多改變網站的源代碼中的div的類,但無法得到適當的結果。

+0

再次檢查您的html結構。 –

回答

0

所以,讓我們檢查你的錯誤:

首先:您查詢$pk_xpath->query('//h5[@class="name"]', $pat),然後只item(0)服用。

這意味着您在xpath-query中使用跳過所有其他DOMNodes。但是,如果你這樣做:

$names = $pk_xpath->query('//h5[@class="name"]', $pat); 
foreach ($names as $n) { 
    echo $n->nodeValue . PHP_EOL; 
} 

你會看到從你的頁面的所有名稱。

第二:價格。如果你檢查一個刮頁的html,你會看到span[@class="price product-price"]爲每個項目加倍。一個span是可見的,第二個是彈出塊,目前隱藏。

因此,您需要另一個xpath查詢,例如您可以找到所有.product-meta項目,然後在其中搜索price product-price

+0

我只是新來這個刮!你能告訴我如何做到這一點找到方法? –