2017-06-22 88 views
1

我用Beautifulsoup和soup.findAll來達到相關信息,但是我想刪除1值(它在<TR>...</TR>之間)惠普他的<TR>標籤。 我該怎麼做? Python 2.7版如何從網頁抓取數據中刪除元素?

. 
. 
. 

soup = BeautifulSoup(x, 'lxml') 

tab6col = soup.findAll('table', { "class" : "tab6col" }) 

這裏我的html代碼:

[<table border="0" class="tab6col" id="pm">\n<tr><td>\xa0</td><td align="right" class="contentword"><b>2015. \xe9v</b></td><td align="right" class="contentword"><b>2014. \xe9v</b></td><td align="right" class="contentword"><b>2013. \xe9v</b></td><td align="right" class="contentword"><b>2012. \xe9v</b></td><td align="right" class="contentword"><b>2011. \xe9v</b></td></tr><tr><td class="contentword"><b>Besz\xe1mol\xe1si id\xf5szak</b></td><td align="right" class="contentword"><span class="pm_idoszak">2015.01.01. - 2015.12.31.</span></td><td align="right" class="contentword"><span class="pm_idoszak">2014.01.01. - 2014.12.31.</span></td><td align="right" class="contentword"><span class="pm_idoszak">2013.12.30. - 2013.12.31.</span></td><td align="right" class="contentword"><span class="pm_idoszak">Nincs adat.</span></td><td align="right" class="contentword"><span class="pm_idoszak">Nincs adat.</span></td></tr><tr><td>\xa0</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td><td align="right" class="contentword">eFt</td></tr><tr><td class="contentword">\xc9rt\xe9kes\xedt\xe9s nett\xf3 \xe1rbev\xe9tele</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Bev\xe9telek</td><td align="right" class="numberc">2 873 821</td><td align="right" class="numberc">3 162 742</td><td align="right" class="numberc">9 194</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">\xdczemi eredm\xe9ny</td><td align="right" class="numberc">81 937</td><td align="right" class="numberc">-181 850</td><td align="right" class="numberc">1 755</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Ad\xf3z\xe1s el\xf5tti eredm\xe9ny</td><td align="right" class="numberc">-192 778</td><td align="right" class="numberc">-169 476</td><td align="right" class="numberc">1 755</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">M\xe9rleg szerinti eredm\xe9ny</td><td align="right" class="numberc">-124 099</td><td align="right" class="numberc">0</td><td align="right" class="numberc">1 421</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Ad\xf3zott eredm\xe9ny</td><td align="right" class="numberc">-192 778</td><td align="right" class="numberc">-169 476</td><td align="right" class="numberc">1 579</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Eszk\xf6z\xf6k \xf6sszesen</td><td align="right" class="numberc">37 820 881</td><td align="right" class="numberc">40 695 842</td><td align="right" class="numberc">36 992 091</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Befektetett eszk\xf6z\xf6k</td><td align="right" class="numberc">18 668 826</td><td align="right" class="numberc">18 525 063</td><td align="right" class="numberc">16 925 711</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Forg\xf3eszk\xf6z\xf6k</td><td align="right" class="numberc">19 008 587</td><td align="right" class="numberc">21 877 275</td><td align="right" class="numberc">19 792 420</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">P\xe9nzeszk\xf6z\xf6k</td><td align="right" class="numberc">947 015</td><td align="right" class="numberc">1 056 101</td><td align="right" class="numberc">1 307 515</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Akt\xedv id\xf5beli elhat\xe1rol\xe1sok</td><td align="right" class="numberc">143 468</td><td align="right" class="numberc">293 504</td><td align="right" class="numberc">273 960</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Saj\xe1t t\xf5ke</td><td align="right" class="numberc">2 141 319</td><td align="right" class="numberc">2 184 079</td><td align="right" class="numberc">2 353 554</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">C\xe9ltartal\xe9kok</td><td align="right" class="numberc">29 656</td><td align="right" class="numberc">148 652</td><td align="right" class="numberc">18 960</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">K\xf6telezetts\xe9gek</td><td align="right" class="numberc">35 541 531</td><td align="right" class="numberc">38 059 399</td><td align="right" class="numberc">34 233 518</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">R\xf6vid lej\xe1rat\xfa k\xf6telezetts\xe9gek</td><td align="right" class="numberc">30 519 491</td><td align="right" class="numberc">30 426 014</td><td align="right" class="numberc">26 394 088</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Hossz\xfa lej\xe1rat\xfa k\xf6telezetts\xe9gek</td><td align="right" class="numberc">5 022 040</td><td align="right" class="numberc">7 633 385</td><td align="right" class="numberc">7 839 430</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Passz\xedv id\xf5beli elhat\xe1rol\xe1sok</td><td align="right" class="numberc">108 375</td><td align="right" class="numberc">303 712</td><td align="right" class="numberc">386 059</td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword" colspan="6"><b>P\xe9nz\xfcgyi mutat\xf3k</b></td></tr><tr><td class="contentword">Elad\xf3sodotts\xe1g foka <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Elad\xf3sodotts\xe1g foka&lt;/span&gt; (K\xf6telezetts\xe9gek/Eszk\xf6z\xf6k \xf6sszesen)&lt;br&gt;&lt;i&gt;Megmutatja, hogy az eszk\xf6z \xe1llom\xe1ny milyen m\xe9rt\xe9kben van megterhelve k\xf6telezetts\xe9gv\xe1llal\xe1ssal. Min\xe9l kisebb a mutat\xf3 \xe9rt\xe9ke, ann\xe1l jobb a c\xe9g meg\xedt\xe9l\xe9se.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Elad\xf3sodotts\xe1g m\xe9rt\xe9ke - Bonit\xe1s <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Elad\xf3sodotts\xe1g m\xe9rt\xe9ke - Bonit\xe1s&lt;/span&gt; (K\xf6telezetts\xe9gek/Saj\xe1t t\xf5ke)&lt;br&gt;&lt;i&gt;Azt mutatja, hogy a saj\xe1t forr\xe1sok a k\xf6telezetts\xe9gek h\xe1ny sz\xe1zal\xe9k\xe1t fedezik. Pozit\xedv a c\xe9g meg\xedt\xe9l\xe9se, ha a mutat\xf3 \xe9rt\xe9ke tart\xf3san (j\xf3val) 1 alatt van.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">\xc1rbev\xe9tel ar\xe1nyos eredm\xe9ny % <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;\xc1rbev\xe9tel ar\xe1nyos eredm\xe9ny %&lt;/span&gt; (Ad\xf3zott eredm\xe9ny/ Nett\xf3 \xe1rbev\xe9tel)\xd7100&lt;br&gt;&lt;i&gt;A mutat\xf3 az \xe1rbev\xe9tel hat\xe9konys\xe1g\xe1t fejezi ki \xfagy, hogy az \xe1rbev\xe9tel nyeres\xe9gtartalm\xe1t sz\xe1zal\xe9kban szeml\xe9lteti. A c\xe9g meg\xedt\xe9l\xe9se ann\xe1l pozit\xedvabb, min\xe9l magasabb a sz\xe1zal\xe9k.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Likvidit\xe1si gyorsr\xe1ta <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Likvidit\xe1si gyorsr\xe1ta&lt;/span&gt; ((Forg\xf3eszk\xf6z\xf6k-K\xe9szletek)/R\xf6vid lej.k\xf6telezetts\xe9gek)&lt;br&gt;&lt;i&gt;Azt fejezi ki, hogy az egy \xe9v alatt p\xe9nzz\xe9 tehet\xf5 k\xe9szletek n\xe9lk\xfcli forg\xf3eszk\xf6z\xf6k milyen ar\xe1nyban k\xe9pesek az egy \xe9ven bel\xfcl esed\xe9kes k\xf6telezetts\xe9gek fedez\xe9s\xe9re, azaz milyen a c\xe9g r\xf6vid t\xe1v\xfa fizet\xf5k\xe9pess\xe9ge.&lt;br&gt;A c\xe9g meg\xedt\xe9l\xe9se akkor pozit\xedv, ha ez az ar\xe1ny egyre n\xf6vekv\xf5, ami az azonnali fizet\xf5k\xe9pess\xe9g javul\xe1s\xe1t jelzi.&lt;/i&gt;');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc"></td><td align="right" class="numberc">Nincs adat.</td><td align="right" class="numberc">Nincs adat.</td></tr><tr><td class="contentword">Saj\xe1t t\xf5ke ar\xe1nya <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Saj\xe1t t\xf5ke ar\xe1nya &lt;/span&gt; (Saj\xe1t t\xf5ke/Forr\xe1sok)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">0,06</td><td align="right" class="numberc">0,05</td><td align="right" class="numberc">0,06</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">Eszk\xf6zar\xe1nyos nyeres\xe9g <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Eszk\xf6zar\xe1nyos nyeres\xe9g &lt;/span&gt; (Ad\xf3zott eredm\xe9ny/Eszk\xf6z\xf6k)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">-0,01</td><td align="right" class="numberc">0,00</td><td align="right" class="numberc">0,00</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">Bev\xe9telar\xe1nyos eredm\xe9ny <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Bev\xe9telar\xe1nyos eredm\xe9ny &lt;/span&gt; (Ad\xf3zott eredm\xe9ny/Bev\xe9telek)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">-0,07</td><td align="right" class="numberc">-0,05</td><td align="right" class="numberc">0,17</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword">Saj\xe1t t\xf5ke ar\xe1nyos nyeres\xe9g <span onmouseout="remove_hint();" onmouseover="show_hint(this, '&lt;span style=&quot;color: red; font-weight: bold;&quot;&gt;Saj\xe1t t\xf5ke ar\xe1nyos nyeres\xe9g &lt;/span&gt; (Ad\xf3zott eredm\xe9ny/Saj\xe1t t\xf5ke)');" style="cursor: pointer; color: red; font-family: InformationLogo, Webdings;">i</span></td><td align="right" class="numberc">-0,09</td><td align="right" class="numberc">-0,08</td><td align="right" class="numberc">0,00</td><td align="right" class="numberc"></td><td align="right" class="numberc"></td></tr><tr><td class="contentword" colspan="6"><b>L\xe9tsz\xe1m:</b> \xa0 136 f\xf5</td>\n</tr></table>]

,我想刪除該表中這個值:

<tr><td class="contentword" colspan="6"><b>P\xe9nz\xfcgyi mutat\xf3k</b></td></tr>

我全碼:

import urllib2 
 
import unicodecsv as csv 
 
import os 
 
import sys 
 
import io 
 
import time 
 
import datetime 
 
import pandas as pd 
 
from bs4 import BeautifulSoup 
 
import MySQLdb 
 

 
def to_2d(l,n): 
 
    return [l[i:i+n] for i in range(0, len(l), n)] 
 

 
filename=r'output.csv' 
 

 
resultcsv=open(filename,"wb") 
 
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1') 
 

 
f = open('opten2.txt', 'r') 
 
x = f.read() 
 

 
soup = BeautifulSoup(x, 'lxml') 
 

 
tab6col = soup.find('table', { "class" : "tab6col" }) 
 

 

 
datatable=[] 
 
for record in tab6col.findAll('tr'): 
 
    for data in record.findAll('td'): 
 
     datatable.append(data.text.encode('latin-1')) 
 

 
td = datatable.find("td", text="P\xe9nz\xfcgyi mutat\xf3k") 
 
td.decompose() 
 

 

 
maindatatable = to_2d(datatable, 6) 
 
print maindatatable 
 
output.writerows(maindatatable) 
 

 
resultcsv.close()

+1

對不起,您想刪除什麼?桌子? – obskyr

+0

顯示示例示例或說輸入和所需的輸出 –

+0

我用bsoup來得到一個表,並且我想刪除這個TR之間的一個值。 我更新了我的問題,並插入完整的html代碼,更易於理解。 – tardos93

回答

1

你需要的是decompose()。找到td標記並使用deompose()刪除它。

soup = BeautifulSoup(x, "lxml") 
tab6col = soup.find("table", { "class" : "tab6col" }) 
td = tab6col.find("tr", text="P\xe9nz\xfcgyi mutat\xf3k") 
td.decompose() 

編輯

試試這個。

import urllib2 
import unicodecsv as csv 
import os 
import sys 
import io 
import time 
import datetime 
import pandas as pd 
from bs4 import BeautifulSoup 
import MySQLdb 

filename=r'output.csv' 

resultcsv=open(filename,"wb") 
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1') 

f = open('opten2.txt', 'r') 
x = f.read() 
f.close() 

soup = BeautifulSoup(x, 'lxml') 
tab6col = soup.find('table', { "class" : "tab6col" }) 

datatable=[] 
for record in tab6col.find_all('tr'): 
    temp_data = [] 
    for data in record.find_all('td'): 
     temp_data.append(data.text.encode('latin-1')) 
    datatable.append(temp_data) 

output.writerows(datatable) 

resultcsv.close() 
+0

所以我必須使用它? 'TD = tab6col.find( 「P \ xe9nz \ xfcgyi mutat \ xf3k」,對齊=假) td.decompose()' – tardos93

+0

@ tardos93沒有。不是那樣。你想刪除所有沒有align屬性的'td'標籤? –

+0

當然不是。我想用他的td標籤只刪除這些數據。 – tardos93