有沒有加快這個webscraping迭代的方法？熊貓

因此，我正在收集股票列表中的數據，並將所有這些信息放入數據框中。該名單約有700只股票。有沒有加快這個webscraping迭代的方法？熊貓

import pandas as pd 

stock =['adma','aapl','fb'] # list has about 700 stocks which I extracted from a pickled dataframe that was storing the info. 

#The site I'm visiting is below with the name of the stock added to the end of the end of the link 
##http://finviz.com/quote.ashx?t=adma 
##http://finviz.com/quote.ashx?t=aapl

我只是通過提取該網站的一個部分，明顯[-2]下面

df2 = pd.DataFrame() 

for i in stock: 
    df = pd.read_html('http://finviz.com/quote.ashx?t={}'.format(i), header =0)[-2].set_index('SEC Form 4') 
    df['Stock'] = i.upper() # creating a column which has the name of the stock, so I can differentiate between stocks 
    df2 = df2.append(df)

代碼感覺就像我在做每次迭代幾秒鐘，我有目前大概需要700個。這不是非常緩慢，但我只是好奇，如果有一個更有效的方法。謝謝。

來源

2016-11-16 Moondra

檢查我的[問題]（http://stackoverflow.com/questions/40641166/how-to-add-an-id-column-to-identify-read-html-tables），可能這可以幫助你。 – tumbleweed

您當前的代碼是阻止，您不會繼續檢索下一個url的信息，直到完成當前操作。相反，您可以切換到基於twisted的Scrapy，並同時異步處理多個頁面。

來源

2016-11-16 22:04:48 alecxe

感謝您的意見。我從來沒有聽說過'Scrapy'。這是goto網絡刮板嗎？我建議稍後回顧一下'Requests'，但我假設'Scrapy'現在更受歡迎 – Moondra

有沒有加快這個webscraping迭代的方法？熊貓

回答

相關問題