我通過Google財經的歷史頁面爲股票(http://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=PLfUVIDTDuSRiQKhwYGQBQ)刮取數據。使用python通過分頁表格刮取數據
我可以在當前頁面上刮掉30行。我面臨的問題是我無法通過表格中的其餘數據(31-241行)。我如何轉到下一頁或鏈接。 以下是我的代碼:
import urllib2
import xlwt #to write into excel spreadsheet
from bs4 import BeautifulSoup
# Main Coding Section
stock_links = open('stock_link_list.txt', 'r') #opening text file for reading
#url="https://www.google.com/finance/historical?q=NSE%3ASIEMENS&ei=zHXOVLPnApG2iALxxYCADQ"
for url in stock_links:
OurFile = urllib2.urlopen(url)
OurHtml = OurFile.read()
OurFile.close()
soup = BeautifulSoup(OurHtml)
#soup1 = soup.find("div", {"class": "gf-table-wrapper sfe-break-bottom-16"}).get_text()
soup1 = soup.find("table", {"class": "gf-table historical_price"}).get_text()
end = url.index('&')
filename = url[47:end]
file = open(filename, 'w') #opening text file for writing
file.write(soup1)
#file.write(soup1.get_text()) #writing to the text file
file.close() #closing the text file
謝謝帕德里克C.你的回答讓我今天新學到一些東西。我在我現有的鏈接列表中添加了「&start = {}」。它像一個魅力。由於我缺乏聲望點,我無法提出您的答案。我有積分的一天,我會來這裏,並upvote這個真棒的答案。 – NitheshKHP 2015-02-06 16:34:59
@NitheshKHP,不用擔心。 – 2015-02-06 16:36:48