美麗的行循環只運行一次？

我幾乎完成了抓取表格的webcralwer。這僅輸出表中的第一行。任何人都可以幫助確定爲什麼這不會返回表中的所有行。請忽略while循環，因爲它最終會有一個循環部分。美麗的行循環只運行一次？

import urllib 
from bs4 import BeautifulSoup 

#file_name = "/user/joe/uspc-cpc.txt 
#file = open(file_name,"w") 
i=125 
while i==125: 
    url = "http://www.uspto.gov/web/patents/classification/cpc/html/us" + str(i) + "tocpc.html" 
    print url + '\n' 
    i += 1 
    data = urllib.urlopen(url).read() 
    print data 
    #get the table data from dump 
    #append to csv file 
    soup = BeautifulSoup(data) 
    table = soup.find("table", width='80%') 
    for tr in table.findAll('tr')[2:]: 
     col = row.findAll('td') 
     uspc = col[0].get_text().encode('ascii','ignore') 
     cpc1 = col[1].get_text().encode('ascii','ignore') 
     cpc2 = col[2].get_text().encode('ascii','ignore') 
     cpc3 = col[3].get_text().encode('ascii','ignore') 
     print uspc + ',' + cpc1 + ',' + cpc2 + ',' + cpc3 + '\n' 
     #file.write(record) 

#file.close()

CODE我運行：

import urllib 
from bs4 import BeautifulSoup 

#file_name = "https://stackoverflow.com/users/ripple/uspc-cpc.txt" 
#file = open(file_name,"w") 
i=125 
while i==125: 
    url = "http://www.uspto.gov/web/patents/classification/cpc/html/us" + str(i) + "tocpc.html" 
    print 'Grabbing from: ' + url + '\n' 
    i += 1 
    #get the table data from the page 
    data = urllib.urlopen(url).read() 
    #send to beautiful soup 
    soup = BeautifulSoup(data) 
    table = soup.find("table", width='80%') 
    for tr in table.findAll('tr')[2:]: 
     col = tr.findAll('td') 

     uspc = col[0].get_text().encode('ascii','ignore').replace(" ","") 
     cpc1 = col[1].get_text().encode('ascii','ignore').replace(" ","") 
     cpc2 = col[2].get_text().encode('ascii','ignore').replace(" ","") 
     cpc3 = col[3].get_text().encode('ascii','ignore').replace(" ","").replace("more...", "") 
     record = uspc + ',' + cpc1 + ',' + cpc2 + ',' + cpc3 + '\n' 
     print record 
     #file.write(record) 

#file.close()

來源

2013-04-09 Super-cluser

是什麼打印？ – 2013-04-09 17:34:47

您沒有定義「行」。 – 2013-04-09 17:36:17

@Marjin Pieters：如何定義行？輸出是一行：125/901，H 03H 3/02，B 28D 5/00，H 03H 3/04，B 23D 47/005，B 24B 37/08更多... – 2013-04-09 17:37:36

您使用tr作爲循環變量，而是指在row代替循環。如果你之前已經定義了row，它可能會產生令人困惑的結果。

for tr in table.findAll('tr')[2:]: 
    col = tr.findAll('td')

作品對我來說：

125/1,B 28D 1/00,B 28D 1/221,E 01C 23/081,B 28D 1/005,B 28D 1/06more... 

125/2,B 23Q 35/10,B 22C 9/18,B 23B 5/162,B 23D 63/18,B 24B 53/07more... 

125/3,B 28D 1/18,B 28D 1/003,B 28D 1/048,B 28D 1/181,B 24B 7/22more...

等

來源

2013-04-09 17:37:11

我改變了tr。我仍然只打印一行。 – 2013-04-09 17:42:57

@JosephLee：不在您發佈的代碼中。它適用於我所做的修正。 – 2013-04-09 17:43:49

太奇怪了。我仍然只打印一行代碼。湯後打印出來。這是最後一行，所以我不知道該怎麼想...... 125/901，H03H3/02，B28D5/00，H03H3/04，B23D47/005，B24B37/08 – 2013-04-09 17:46:28

美麗的行循環只運行一次？

回答

相關問題