2014-10-09 78 views
1

這裏是我的腳本,其中我提取三項藥名,通用名,類名。我在這裏的問題是,我成功地獲取藥物名稱單獨但通用名稱和類名稱來作爲字符串。如果您將運行該腳本,您會更好地瞭解我實際上想說什麼,我想要存儲通用名稱和類名稱是表中的單獨列。simplehtmldom解析腳本打破純文本數據

腳本

<?php 

error_reporting(0); 

//simple html dom file 
require('simple_html_dom.php'); 

//target url 
$html = file_get_html('http://www.drugs.com/condition/atrial-flutter.html?rest=1'); 

//crawl td columns 

foreach($html->find('td') as $element) 
{ 
    //get drug name 
    $drug_name = $element->find('b'); 
    foreach($drug_name as $drug_name) 
    { 
     echo "Drug Name:-".$drug_name; 

     foreach($element->find('span[class=small] a',2) as $t) 
     { 
      //get the inner HTML 
      $data = $t->plaintext; 
      echo $data; 
     } 

     echo "<br/>"; 
    } 
} 

?> 

提前

回答

1

您當前的代碼是一個有點感謝遠,你需要做什麼,但你可以利用CSS選擇器來獲得這些元素更容易。

例子:

$data = array(); 
$html = file_get_html('http://www.drugs.com/condition/atrial-flutter.html?rest=1'); 
foreach($html->find('tr td[1]') as $td) { // you do not need to loop each td! 
// target the first td of the row 
    $drug_name = $td->find('a b', 0)->innertext; // get the drug name bold tag inside anchor 
    $other_info = $td->find('span.small[2]', 0); // get the other info 
    $generic_name = $other_info->find('a[1]', 0)->innertext; // get the first anchor, generic name 
    $children_count = count($other_info->children()); // count all of the children 
    $classes = array(); 
    for($i = 1; $i < $children_count; $i++) { // since you already got the first, (in position zero) iterate all children starting from 1 
     $classes[] = $other_info->find('a', $i)->innertext; // push it inside another container 
    } 

    $data[] = array(
     'drug_name' => $drug_name, 
     'generic_name' => $generic_name, 
     'classes' => $classes, 
    ); 
} 

echo '<pre>'; 
print_r($data); 
+0

大,它的工作。 – user2960749 2014-10-09 07:05:21

+0

@ user2960749 im很高興幫助 – Ghost 2014-10-09 07:06:29

+0

@ user2960749:如果提出的解決方案對您有幫助,您應該*接受*。只需點擊帖子分數下方的小勾號即可。 – lxg 2014-10-13 09:12:38