加快文件下載的處理速度從網絡

我正在寫有從網上下載了一堆文件，它甚至可以運行以前的程序，所以我創建了一個將下載的所有文件和「功能初始化」稱爲init_program程序，它的工作原理是通過它一對夫婦dicts有URL到GitHub上一個gistfiles運行。它拉動網址並使用urllib2下載它們。我將不能夠添加的所有文件，但你可以通過克隆回購here嘗試一下。下面是從要旨創建文件的功能：加快文件下載的處理速度從網絡

def init_program(): 
    """ Initialize the program and allow all the files to be downloaded 
     This will take awhile to process, but I'm working on the processing 
     speed """ 

    downloaded_wordlists = [] # Used to count the amount of items downloaded 
    downloaded_rainbow_tables = [] 

    print("\n") 
    banner("Initializing program and downloading files, this may take awhile..") 
    print("\n") 

    # INIT_FILE is a file that will contain "false" if the program is not initialized 
    # And "true" if the program is initialized 
    with open(INIT_FILE) as data: 
     if data.read() == "false": 
      for item in GIST_DICT_LINKS.keys(): 
       sys.stdout.write("\rDownloading {} out of {} wordlists.. ".format(len(downloaded_wordlists) + 1, 
                        len(GIST_DICT_LINKS.keys()))) 
       sys.stdout.flush() 
       new_wordlist = open("dicts/included_dicts/wordlists/{}.txt".format(item), "a+") 
       # Download the wordlists and save them into a file 
       wordlist_data = urllib2.urlopen(GIST_DICT_LINKS[item]) 
       new_wordlist.write(wordlist_data.read()) 
       downloaded_wordlists.append(item + ".txt") 
       new_wordlist.close() 

      print("\n") 
      banner("Done with wordlists, moving to rainbow tables..") 
      print("\n") 

      for table in GIST_RAINBOW_LINKS.keys(): 
       sys.stdout.write("\rDownloading {} out of {} rainbow tables".format(len(downloaded_rainbow_tables) + 1, 
                        len(GIST_RAINBOW_LINKS.keys()))) 
       new_rainbowtable = open("dicts/included_dicts/rainbow_tables/{}.rtc".format(table)) 
       # Download the rainbow tables and save them into a file 
       rainbow_data = urllib2.urlopen(GIST_RAINBOW_LINKS[table]) 
       new_rainbowtable.write(rainbow_data.read()) 
       downloaded_rainbow_tables.append(table + ".rtc") 
       new_rainbowtable.close() 

      open(data, "w").write("true").close() # Will never be initialized again 
     else: 
      pass 

    return downloaded_wordlists, downloaded_rainbow_tables

這個工作，是的，但它是非常緩慢的，由於文件的大小，每個文件中有至少100,000行。我如何加快此功能，使其更快，更方便用戶使用？

來源

2016-12-07 papasmurf

嗯，這取決於你的無線網絡連接。幾乎沒有辦法可以加快這一點，除了提高你的無線網絡。對不起，說。 – Qwerty

@Qwerty即使有線程？我的意思是這是緩慢的，是的，它將在最後值得它，但它是一個慢初始化過程.. – papasmurf

嗯... http://stackoverflow.com/a/9010299/2308683 –

幾個星期前，我面對它被需要下載許多巨大的文件，但所有的單純Python的解決方案，我發現是不是在下載優化方面足夠好了類似的情況。所以我發現Axel - Linux和Unix

什麼是阿克塞爾輕命令行下載加速器？

阿克塞爾試圖通過使用多個連接爲一個文件，類似的DownThemAll等知名方案，以加快下載過程。它也可以使用多個鏡像進行一次下載。

使用阿克塞爾，你會更快地從網上獲取文件。因此，阿克塞爾可以加快下載達60％（約，根據一些測試）。

Usage: axel [options] url1 [url2] [url...] 

--max-speed=x  -s x Specify maximum speed (bytes per second) 
--num-connections=x -n x Specify maximum number of connections 
--output=f  -o f Specify local output file 
--search[=x]  -S [x] Search for mirrors and download from x servers 
--header=x  -H x Add header string 
--user-agent=x  -U x Set user agent 
--no-proxy  -N Just don't use any proxy server 
--quiet   -q Leave stdout alone 
--verbose  -v More status information 
--alternate  -a Alternate progress indicator 
--help   -h This information 
--version  -V Version information

由於axel用C語言編寫，並沒有C擴展爲Python，所以我用了subprocess模塊外處決他和作品完美的我。

你可以做這樣的事情：

cmd = ['/usr/local/bin/axel', '-n', str(n_connections), '-o', 
       "{0}".format(filename), url] 
process = subprocess.Popen(cmd,stdin=subprocess.PIPE, stdout=subprocess.PIPE)

您也可以分析每個下載解析標準輸出的輸出過程。

while True: 
     line = process.stdout.readline() 
     progress = YOUR_GREAT_REGEX.match(line).groups() 
     ...

來源

2016-12-07 03:24:57 GustavoIP

這僅在託管網站支持並行下載 –

這是真的，但可以在「最」的情況下是有用的工作。但不幸的是，這不是一顆銀彈。 – GustavoIP

@GustavolP我也在研究Windows機器..儘管如此，這是一個天才的工作+1 – papasmurf

您阻止，而你等待每個下載。所以總時間是每次下載往返時間的總和。您的代碼可能會花費大量時間等待網絡流量。改善這種情況的一種方法不是在等待每個響應時阻止。你可以用幾種方法來做到這一點。例如，將每個請求交給一個單獨的線程（或進程），或者使用事件循環和協程。閱讀線程和asyncio模塊。

來源

2016-12-07 08:20:08

在等待每個下載時詳細說明你的意思嗎？ – papasmurf

urlopen（）後跟read（）表示您正在等待連接被打開，請求被髮送並且響應到達。這種網絡流量可能需要很長時間，並且您的代碼所花費的大部分時間都在等待網絡流量。當你有很多要求讓你不想等到第一個人的迴應時，在你開始下一個之前。 –

那麼你如何提議我這樣做？創建一個線程隊列，只需在需要時拉出它們？ – papasmurf

加快文件下載的處理速度從網絡

回答

相關問題