Python的正則表達式解析

我在Python中的字符串，其陣列中的每個字符串看起來像這樣的數組：Python的正則表達式解析

<r n="Foo Bar" t="5" s="10" l="25"/>

我一直在尋找了一段時間，我能找到的最好的事情是試圖將HTML超鏈接正則表達式修改爲適合我需要的東西。

但真的不知道很多正則表達式的東西，我還沒有任何工作。這是我迄今爲止所擁有的。

string = '<r n="Foo Bar" t="5" s="10" l="25"/>' 
print re.split("<r\s+n=(?:\"(^\"]+)\").*?/>", string)

從該字符串中提取n，t，s和l值的最佳方法是什麼？

來源

2009-05-02 AdamB

這將讓你最那裏的方式：

>>> print re.findall(r'(\w+)="(.*?)"', string) 
[('n', 'Foo Bar'), ('t', '5'), ('s', '10'), ('l', '25')]

re.split和re.findall是互補的。

每當您的思考過程以「我希望每個項目看起來像X」開始，那麼您應該使用re.findall。當它以「我需要X和周圍的數據」開始時，請使用re.split。

來源

2009-05-02 12:34:08 Clint

完美地工作，謝謝。 – AdamB 2009-05-02 12:36:25

<r n="Foo Bar" t="5" s="10" l="25"/>

該源看起來像XML，因此，「最好的辦法」是使用的XML解析模塊。如果它不完全XML，BeautifulSoup（或者說，BeautifulSoup.BeautifulStoneSoup模塊）可效果最好，因爲它善於應對可能的，無效的XML（或事物「都沒有相當 XML」）：

>>> from BeautifulSoup import BeautifulStoneSoup 
>>> soup = BeautifulStoneSoup("""<r n="Foo Bar" t="5" s="10" l="25"/>""") 

# grab the "r" element (You could also use soup.findAll("r") if there are multiple 
>>> soup.find("r") 
<r n="Foo Bar" t="5" s="10" l="25"></r> 

# get a specific attribute 
>>> soup.find("r")['n'] 
u'Foo Bar' 
>>> soup.find("r")['t'] 
u'5' 

# Get all attributes, or turn them into a regular dictionary 
>>> soup.find("r").attrs 
[(u'n', u'Foo Bar'), (u't', u'5'), (u's', u'10'), (u'l', u'25')] 
>>> dict(soup.find("r").attrs) 
{u's': u'10', u'l': u'25', u't': u'5', u'n': u'Foo Bar'}

來源

2009-05-02 13:32:54 dbr

回答

相關問題