如何提取python中的特定字符串

我試圖在標記中提取特定的字符串並保存它們（對於此行更復雜的處理）。所以說，例如，我在一條直線從一個文件中讀取當前行是：如何提取python中的特定字符串

<center><img border="0" src="http://www.world-of-waterfalls.com/images/Cascades_04_015L.jpg" WIDTH="500" HEIGHT="375" alt="Looking up the Merced River Canyon towards Bridalveil Fall from the Big Oak Flat Road" ***PINIT***></center><br clear="all"><br clear="all">

但我想存儲：

tempUrl = 'http://www.world-of-waterfalls.com/images/Cascades_04_015L.jpg' 

tempWidth = 500 

tempHeight = 375 

tempAlt = 'Looking up the Merced River Canyon towards Bridalveil Fall from the Big Oak Flat Road'

我怎麼會去這樣做在Python ？

感謝

來源

2016-12-15 Johnny

讓我爲你省去麻煩，並告訴你正則表達式出於此目的。不要以爲嘗試它，你以後只會碰到你的頭。如果數據來自Web源，請查看BeautifulSoup或scrapy或任何其他「抓取」庫。如果你已經有了標記，你可以使用解析器並遍歷節點並收集屬性信息。 –

['HTMLParser']（https://docs.python.org/2/library/htmlparser.html）或['html.parser']（https://docs.python.org/3.4/library/html。 parser.html）取決於python版本 –

雖然你可以用幾種方法擺脫這裏，我建議使用一個HTML解析器，這是可擴展的，並且可以處理的HTML的許多問題。下面是與BeautifulSoup工作的例子：

>>> from bs4 import BeautifulSoup 
>>> string = """<center><img border="0" src="http://www.world-of-waterfalls.com/images/Cascades_04_015L.jpg" WIDTH="500" HEIGHT="375" alt="Looking up the Merced River Canyon towards Bridalveil Fall from the Big Oak Flat Road" ***PINIT***></center><br clear="all"><br clear="all">""" 
>>> soup = BeautifulSoup(string, 'html.parser') 
>>> for attr in ['width', 'height', 'alt']: 
...  print('temp{} = {}'.format(attr.title(), soup.img[attr])) 
... 
tempWidth = 500 
tempHeight = 375 
tempAlt = Looking up the Merced River Canyon towards Bridalveil Fall from the Big Oak Flat Road

來源

2016-12-15 17:16:51 brianpck

最終得到bs4安裝後，這是一個美麗的解決方案。謝謝！ – Johnny

而正則表達式的方法：

import re 

string = "YOUR STRING" 
matches = re.findall("src=\"(.*?)\".*WIDTH=\"(.*?)\".*HEIGHT=\"(.*?)\".*alt=\"(.*?)\"", string)[0] 
tempUrl = matches[0] 
tempWidth = matches[1] 
tempHeight = matches[2] 
tempAlt = matches[3]

所有值都是串的，所以如果你想投吧..

，知道用正則表達式副本/粘貼是一個壞主意。容易出錯。

來源

2016-12-15 17:44:09

如何提取python中的特定字符串

回答

相關問題