如何從文本蟒蛇提取列數據（正則表達式）

比方說，我們有文字中的列標題存儲形式：如何從文本蟒蛇提取列數據（正則表達式）

{| 
|+ The table's caption 
! scope="col" width="20"style="background-color:#cfcfcf;"align="center" | Column header 1 
! scope="col" width="20"style="background-color:#ff55ff;"align="center" | Column header 2 
! scope="col" | Column header 3 
|- 
! scope="row" | Row header 1 
| Cell 2 || Cell 3 
|- 
! scope="row" | Row header A 
| Cell B 
| Cell C 
|}

我怎麼能提取所有的列（[列標題1，列標題2,列標題3]）從Python中的文本？

re.findall('*! scope="col" |', text, re.IGNORECASE)

但它沒有完成這項工作。

https://regex101.com/r/PLKREz/6

我怎麼能做到這一點在Python？

來源

2016-11-02 Yamane Imad

你從網上刮本，或者是給你這個文本使用？ –

@Wintro這是從維基百科文章，我的任務是提取表中的列... –

你可以在一行的最後|之後的所有子帶scope="col"：

import re 

data = """ 
{| 
|+ The table's caption 
! scope="col" width="20"style="background-color:#cfcfcf;"align="center" | Column header 1 
! scope="col" width="20"style="background-color:#ff55ff;"align="center" | Column header 2 
! scope="col" | Column header 3 
|- 
! scope="row" | Row header 1 
| Cell 2 || Cell 3 
|- 
! scope="row" | Row header A 
| Cell B 
| Cell C 
|}""" 

print(re.findall(r'scope="col".*?\| ([^|]+)$', data, re.MULTILINE))

打印：

['Column header 1', 'Column header 2', 'Column header 3']

來源

2016-11-02 17:20:23 alecxe

如何從文本蟒蛇提取列數據（正則表達式）

回答

相關問題