2017-04-13 122 views
1

我有以下代碼,它會經歷一個列表並提取要放入新列表的信息。從Python列表中刪除BeautifulSoup標記

如果找到0,則附加0。如果找到'None',則附加0。 第三種列表元素是BeautifulSoup提取的標籤。

我希望能夠做的是,提取一些內部信息標籤和它添加到newList,但是因爲我有regex在標籤中的信息的方式越來越工作。

我的代碼在這裏給出:

list = ['<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=826">11 votes for, 1 vote against, 15 absences, between 1999&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=811">8 votes for, 1 vote against, 3 absences, between 1999&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1050">4 votes for, 0 votes against, 3 absences, between 2002&ndash;2004</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6686">4 votes for, 1 vote against, 2 absences, between 2004&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6703">5 votes for, 0 votes against, 4 absences, between 2011&ndash;2016</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6688">3 votes for, 7 votes against, 1 absence, between 2002&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1049">0 votes for, 6 votes against, between 2002&ndash;2003</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=975">1 vote for, 1 vote against, 13 absences, between 2006&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=984">0 votes for, 4 votes against, 3 absences, between 2007&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1065">45 votes for, 12 votes against, 32 absences, between 2007&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1027">2 votes for, 3 votes against, 8 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6706">3 votes for, 1 vote against, between 2010&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6764">5 votes for, 3 votes against, 4 absences, between 2016&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6761">4 votes for, 4 votes against, 5 absences, between 2016&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6757">0 votes for, 3 votes against, between 2014&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6672">0 votes for, 13 votes against, 4 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6674">5 votes for, 0 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6673">13 votes for, 0 votes against, 2 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6684">0 votes for, 3 votes against, 1 absence, in 2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6674">5 votes for, 0 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6702">8 votes for, 0 votes against, 1 absence, between 2011&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6680">0 votes for, 21 votes against, 4 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1110">3 votes for, 18 votes against, 5 absences, between 2010&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6694">5 votes for, 10 votes against, 4 absences, between 2010&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6681">10 votes for, 0 votes against, 2 absences, between 2012&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1109">1 vote for, 3 votes against, 1 absence, between 2004&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1109">1 vote for, 3 votes against, 1 absence, between 2004&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6685">17 votes for, 1 vote against, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6733">2 votes for, 6 votes against, 2 absences, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6711">2 votes for, 0 votes against, 2 absences, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6716">0 votes for, 5 votes against, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6731">0 votes for, 12 votes against, between 2008&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6756">0 votes for, 4 votes against, 1 absence, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6679">1 vote for, 21 votes against, 4 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6690">5 votes for, 3 votes against, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6691">7 votes for, 7 votes against, between 2010&ndash;2014</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6677">7 votes for, 0 votes against, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6676">0 votes for, 7 votes against, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=363">0 votes for, 4 votes against, 1 absence, in 2003</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=811">8 votes for, 1 vote against, 3 absences, between 1999&ndash;2015</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1074">2 votes for, 14 votes against, 16 absences, between 1998&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1132">0 votes for, 1 vote against, in 2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6687">0 votes for, 9 votes against, 2 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6682">0 votes for, 2 votes against, in 2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1052">4 votes for, 6 votes against, 5 absences, between 1997&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6671">0 votes for, 4 votes against, 2 absences, between 2010&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1113">0 votes for, 11 votes against, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1136">0 votes for, 6 votes against, 2 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=996">2 votes for, 0 votes against, 8 absences, between 2007&ndash;2009</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1084">1 vote for, 1 vote against, 4 absences, between 2010&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=837">10 votes for, 0 votes against, 4 absences, between 2003&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6683">0 votes for, 4 votes against, 1 absence, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6678">0 votes for, 12 votes against, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6698">2 votes for, 2 votes against, 1 absence, between 2010&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1079">5 votes for, 1 vote against, 5 absences, between 1999&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6708">2 votes for, 1 vote against, 16 absences, between 2012&ndash;2017</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6709">8 votes for, 5 votes against, 20 absences, between 2011&ndash;2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6695">23 votes for, 12 votes against, 14 absences, between 2011&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6736">0 votes for, 3 votes against, in 2015</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=842">3 votes for, 1 vote against, 3 absences, between 2004&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1087">3 votes for, 13 votes against, 12 absences, between 2002&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1071">2 votes for, 1 vote against, 2 absences, between 2008&ndash;2009</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1051">6 votes for, 6 votes against, 12 absences, between 2005&ndash;2006</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6696">0 votes for, 7 votes against, 1 absence, between 2011&ndash;2012</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6721">0 votes for, 5 votes against, 3 absences, between 2014&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6734">0 votes for, 7 votes against, 2 absences, between 2015&ndash;2016</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6758">0 votes for, 2 votes against, 1 absence, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1030">19 votes for, 6 votes against, 6 absences, between 2000&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6697">0 votes for, 2 votes against, in 2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6704">4 votes for, 1 vote against, between 2011&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6710">0 votes for, 3 votes against, 1 absence, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6741">2 votes for, 1 vote against, 1 absence, in 2015</a>', 'None', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6747">2 votes for, 0 votes against, 1 absence, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6692">4 votes for, 0 votes against, 1 absence, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6693">6 votes for, 6 votes against, 4 absences, between 2010&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6699">0 votes for, 3 votes against, 6 absences, between 2012&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6746">2 votes for, 0 votes against, 2 absences, in 2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6744">0 votes for, 5 votes against, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6743">0 votes for, 5 votes against, between 2015&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=810">7 votes for, 5 votes against, 3 absences, between 2004&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1120">0 votes for, 3 votes against, 2 absences, in 2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1053">13 votes for, 30 votes against, 27 absences, between 2001&ndash;2010</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=1105">0 votes for, 3 votes against, 2 absences, between 2009&ndash;2011</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6705">2 votes for, 0 votes against, 2 absences, between 2013&ndash;2016</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6707">1 vote for, 7 votes against, 4 absences, between 2011&ndash;2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6715">0 votes for, 5 votes against, 2 absences, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6720">2 votes for, 3 votes against, in 2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6719">0 votes for, 4 votes against, 2 absences, between 2012&ndash;2013</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6718">4 votes for, 0 votes against, in 2014</a>', '<a class="vote-description__evidence" href="/mp/10001/diane_abbott/hackney_north_and_stoke_newington/divisions?policy=6667">9 votes for, 57 votes against, 15 absences, between 2011&ndash;2015</a>'] 

newList = [] 
digitReg = r"\d+" 
for thing in list: 
aggregate = 0 
    if thing == '0': 
     newList.append(0) 
    elif thing == 'None': 
     newList.append(0) 
    else: 
     matches = re.findall(digitReg,thing) 
     forNum = int(matches[0]) 
     againstNum = int(matches[1]) 
     aggregate = forNum - againstNum 
     newList.append(aggregate) 
print newList 
print len(newList) 

的問題是,標籤本身有它的數字,這是擺脫合計值。

通常我只是將代碼更改爲int(matches[2])int(matches[3]);然而,這是不可靠的,因爲我將在不同的列表上運行該代碼,並且標籤本身中的匹配數量將會改變。

有沒有辦法在找到匹配之前從列表中刪除標籤?

回答

2

要使用美麗的湯提取每個標籤內的文本,你可以這樣做:

aggregate = 0 
for thing in list: 
    if thing == '0': 
     newList.append(0) 
    elif thing == 'None': 
     newList.append(0) 
    else: 
     matches = re.findall(digitReg, BeautifulSoup(thing,'html.parser').text) 
     againstNum = int(matches[1]) 
     aggregate = forNum - againstNum 
     newList.append(aggregate) 
+1

完美!太感謝了!我必須從BeautifulSoup括號中取出'html.parser'來阻止錯誤,但現在它像一個魅力一樣工作。 – modestmotion