使用正則表達式解析維基百科列表和描述

不需要太熟悉正則表達式，我需要找到解析維基百科項目列表的方法。我拉着使用維基百科的api.php的內容和我留下，看起來像這樣的數據：使用正則表達式解析維基百科列表和描述

==Formal fallacies== 
    A [[formal fallacy]] is an error in logic that... 

    * [[Appeal to probability]] – takes something for granted because... 
    * [[Argument from fallacy]] – assumes that if an argument ... 
    * [[Base rate fallacy]] – making a probability judgement... 
    * [[Conjunction fallacy]] – assumption that an outcome simultaneously... 
    * [[Masked man fallacy]] – ... 

    ===Propositional fallacies=== 

    * [[Affirming a disjunct]] – concluded that ... 
    * [[Affirming the consequent]] – the [[antecedent... 
    * [[Denying the antecedent]] – the [[consequent]] in...

所以，我需要一種方法來提取數據，以便：

我們只注重線，之間* [[]]的名稱是
其餘內容* [
任何啓動後的 - 是描述

來源

2013-04-24 kilrizzy

'[[]]'是*不*稱號。它只是標記鏈接。 – meagar 2013-04-24 18:40:15

對於我需要的數據，我需要將信息分爲兩部分（謬論名稱）/（謬論描述）。也許叫它的名字會比標題 – kilrizzy 2013-04-24 18:41:41

好，你有什麼嘗試？ – 2013-04-24 18:43:20

這個做的工作：

preg_match_all('~^\h*+\*\h*\[\[(?<name>[a-z ]++)]]\h*+[-–]\h*+(?<description>.++)$~imu', $text, $results, PREG_SET_ORDER); 
foreach($results as &$result) { 
    foreach($result as $key=>$value) { 
     if (is_numeric($key)) unset($result[$key]); } 
} 
echo '<pre>' . print_r($results, true) . '</pre>';

來源

2013-04-24 18:53:05

這幾乎是我以前的，但它返回一個空的數組。有什麼想法嗎？ – kilrizzy 2013-04-24 20:17:12

奇怪的是，它適用於我，我測試過它。也許是空間問題的地方。 – 2013-04-24 21:37:03

嘗試新的編輯 – 2013-04-24 21:42:49

先更換

^((?!\*\s\[\[).)*$

空白。這將刪除不包含行* [

刪除換行符替換

^\n|\r$

空白。

這裏是正則表達式來獲取標題和描述：

^\s+\*\s\[\[([^\]\]]*)\]\]\s–(.*) 
Title: "$1", Description: "$2"

來源

2013-04-24 18:50:07

使用正則表達式解析維基百科列表和描述

回答

相關問題