2015-10-06 97 views
1

我想從網站中提取主表,將其轉換爲JSON,但在我想要的表之前阻礙了我正在使用的代碼。我正在使用的代碼:從網站廢第二個HTML表

<?php 


$singles_chart_url = 'http://www.mediabase.com/mmrweb/allaboutcountry/Charts.asp?format=C1R'; 

// Get the mode from the user: 
$mode = $_GET['chart']; 

// This is an array of elements to remove from the content before stripping it: 
$newlines = array("\t", "\n", "\r", "\x20\x20", "\0", "\x0B"); 

switch($mode) 
{ 
    // They want the Singles chart, or haven't specified what they want: 
    case 'singles': 
    case '': 
    default: 
     $content = file_get_contents($singles_chart_url); 
     $start_search = '<table width="100%" border="0" cellpadding="2" cellspacing="2">'; 
     break; 


} 

$content = str_replace($newlines, "", html_entity_decode($content)); 
$scrape_start = strpos($content, $start_search); 


$scrape_end = strpos($content, '</table>', $scrape_start); 
$the_table = substr($content, $scrape_start, ($scrape_end - $scrape_start)); 



// Now loop through the rows and get the data we need: 
preg_match_all("|<tr(.*)</tr>|U", $the_table, $rows); 

// Set the heading so we can output nice XML: 
switch($_REQUEST['format']) 
{ 


    case 'json': 
    default: 
     header('Content-type: application/json'); 


     $count = 0; 
     foreach($rows[0] as $row) 
     { 
      // Check it's OK: 
      if(!strpos($row, '<th')) 
      { 
       // Get the cells: 
       preg_match_all("|<td(.*)</td>|U", $row, $cells); 
       $cells = $cells[0]; 

       $position = strip_tags($cells[0]); 
       $plus = strip_tags($cells[1]); 
       $artist = strip_tags($cells[2]); 
       $weeks = strip_tags($cells[3]); 

       echo "\n\t\t" . '{'; 
       echo "\n\t\t\t" . '"position" : "' . $position . '", '; 
       echo "\n\t\t\t" . '"plus" : "' . $plus . '", '; 
       echo "\n\t\t\t" . '"artist" : "' . $artist . '", '; 
       echo "\n\t\t\t" . '"noWeeks" : "' . $weeks . '" '; 

    echo ($count != (count($rows[0]) - 2)) ? "\n\t\t" . '}, ' : "\n\t\t" . '}'; 
       $count++; 
      } 
     } 
     echo "\n\t" . ']'; 
     echo "\n" . '}'; 
     break; 
}?> 

website我想放棄。我們的目標是獲取LW,TW,藝術家,標題,等日後開始表的JSON結果以上的回報:

{ 
"chartDate" : "", 
"retrieved" : "1444101246", 
"entries" : 
[ 
    { 
     "position" : "7 DayCharts", 
     "plus" : "Country Past 7 Days -by Overall Rank Return to Main Menu ", 
     "artist" : " ", 
     "noWeeks" : "", 
     "peak" : "", 
     "points" : "", 
     "increase" : "", 
     "us" : "" 
    }, 
] 
} 

,而不是

{ 
"chartDate" : "", 
"retrieved" : "1444101246", 
"entries" : 
[ 
    { 
     "position" : "2", 
     "plus" : "1", 
     "artist" : "KENNY CHESNEY", 
     "noWeeks" : "Save It For A Rainy"", etc . etc. 
    }, 
] 
} 

我可以添加到上面的代碼是什麼檢索該表?

+0

@PaulCrovella嘿搜索一樣,

$start_search ='<TBODY>'; 

工作代碼,謝謝。我是一個PHP新手,我希望我能理解所有這些,但我會看看。 – HalesEnchanted

回答

1

更新 問題是匹配模式。 下面的語句之後,

$content = str_replace($newlines, "", html_entity_decode($content)); 

有些字符替換或刪除,如"和一些標籤都是大寫之中。因此,無論$start_search包含什麼,您總是會獲得0作爲strpos$scrape_start

所以,你必須對PhpFiddle

+0

感謝您的回覆,Err。我以前嘗試過,但沒有更改:( – HalesEnchanted

+0

當你嘗試以上時獲得什麼?相同的輸出? –

+0

是的,相同的輸出。 – HalesEnchanted