2016-02-25 79 views
1

我有我想要提取一些信息,這個PHP代碼,但我停下來HREF步:如何讓只有一些href屬性

$site = "http://www.sports-reference.com/olympics/countries"; 
$site_html = file_get_html($site); 

$country_dirty = $site_html->getElementById('div_countries'); 

     foreach($country_dirty->find('img') as $link){ 

      $country = $link->alt; 
      $link_country = "$site/$country"; 
      $link_country_html = file_get_html($link_country); 

      $link_season = $link_country_html->getElementById('div_medals'); 

       foreach($link_season->find('a') as $season){ 


        echo $link_year_season = $season->href . "\n"; 

        //echo $link_season = strstr ($link_year_season,'summer') . "\n"; 

       } 
      } 

變量$ link_year_season讓我以下的輸出:

/olympics/countries/AFG/summer/2012/ 
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html 
/olympics/athletes/ni/rohullah-nikpai-1.html 
/olympics/countries/AFG/summer/2008/ 
/olympics/athletes/ba/nesar-ahmad-bahawi-1.html 
/olympics/athletes/ni/rohullah-nikpai-1.html 
/olympics/countries/AFG/summer/2004/ 
/olympics/countries/AFG/summer/1996/ 
/olympics/countries/AFG/summer/1988/ 
/olympics/countries/AFG/summer/1980/ 
/olympics/countries/AFG/summer/1972/ 
..... 

我想知道是否有可能獲得僅此輸出:

/olympics/countries/AFG/summer/2012/ 
/olympics/countries/AFG/summer/2008/ 
/olympics/countries/AFG/summer/2004/ 
/olympics/countries/AFG/summer/1996/ 
/olympics/countries/AFG/summer/1988/ 
/olympics/countries/AFG/summer/1980/ 
/olympics/countries/AFG/summer/1972/ 
+0

這樣做的一個快速方法是在輸出中應用'preg_match'或'strpos'或類似的東西,您已經得到了。 – Maximus2012

+0

下面的答案是否可以解決您的問題? http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – chris85

回答

0

你應該是AB le使用此正則表達式檢查鏈接是否以/olympics/countries/AFG/summer/開頭,然後是數字和/

foreach($link_season->find('a') as $season){ 
    if(preg_match('~^/olympics/countries/AFG/summer/\d+/~', $season->href)) { 
     echo $link_year_season = $season->href . "\n"; 
     //echo $link_season = strstr ($link_year_season,'summer') . "\n"; 
    } 
} 

演示:https://regex101.com/r/bZ1vP3/1

您還可以通過捕獲夏天后的數字拉本年度(假設爲一年,第一正則表達式只檢查數量這一個是嚴格)..

foreach($link_season->find('a') as $season){ 
     if(preg_match('~^/olympics/countries/AFG/summer/(\d{4})/~', $season->href, $year)) { 
      echo $link_year_season = $season->href . "\n"; 
      //echo $link_season = strstr ($link_year_season,'summer') . "\n"; 
      echo 'The year is ' . $year[1] . "\n"; 
     } 
} 

如果季節也可以變化,你可以做(?:summer|winter)這將允許summerwinter成爲第四個目錄。

+0

如果你想允許任何國家和任何季節,你可以做'^ \/olympics \/countries \/[AZ] + \ /(?:summer | winter)\/\ d {4} \ /',假設夏季和冬季是奧運會發生的唯一季節;) – shamsup