2016-06-14 68 views
0

我試圖使用多線程從雅虎財務刮取股票數據並將其保存到SQL。但是,我得到了以下錯誤:IOError:[Errno套接字錯誤] [Errno 8]節點名稱或服務名稱提供,或不知道

*Exception in thread Thread-3091: 
Traceback (most recent call last): 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner 
    self.run() 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run 
    self.__target(*self.__args, **self.__kwargs) 
    File "todatabase.py", line 19, in th 
    htmltext = urllib.urlopen(base).read() 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen 
    return opener.open(url) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open 
    return getattr(self, name)(url) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 350, in open_http 
    h.endheaders(data) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders 
    self._send_output(message_body) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output 
    self.send(msg) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send 
    self.connect() 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect 
    self.timeout, self.source_address) 
    File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 557, in create_connection 
    for res in getaddrinfo(host, port, 0, SOCK_STREAM): 
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known* 

這裏是我的代碼:

from threading import Thread 
import sqlite3 
import urllib 
import re 

conn = sqlite3.connect('stock.sqlite') 
cur = conn.cursor() 

cur.execute('''CREATE TABLE IF NOT EXISTS Stock 
    (symbol TEXT UNIQUE PRIMARY KEY, price NUMERIC) ''') 

dic = {} 

def th(ur): 
    base = "http://finance.yahoo.com/q?s=" + ur 
    regex = '<span id="yfs_l84_[^.]*">(.+?)</span>' 
    pattern = re.compile(regex) 
    htmltext = urllib.urlopen(base).read() 
    results = re.findall(pattern, htmltext) 

    try: 
     dic[ur] = results[0] 
    except: 
     print 'got a error!' 

symbolslist = open("symbols.txt").read() 
symbolslist = symbolslist.split("\n") 
threadlist = [] 

for u in symbolslist: 
    t = Thread(target = th, args = (u,)) 
    t.start() 
    threadlist.append(t) 

for b in threadlist: 
    b.join() 

for key, value in dic.items(): 
    print key, value 

    cur.execute('INSERT INTO Stock(symbol,price) VALUES (?,?)',(key,value)) 
    conn.commit() 

cur.close() 

我認爲錯誤也許在多線程的部分,因爲我可以得到的數據,而無需使用多線程,但在低速。

多線程和這個錯誤,我只是在最後得到200+(符號,價格),而不是3145

我試圖改變DNS和IP,並不能解決問題。

+0

不要使用正則表達式來解析html,還有一個你可以訪問的yahoo api,它會給你json –

回答

0

我記得我曾遇到多線程和大量套接字打開的問題。額外的鎖解決了我的問題。但是,我沒有試圖找到真正的問題。 urllib doc沒有提到有關線程安全的任何信息。你可以嘗試這樣的事:

global_lock = threading.Lock() 
... 
def th(ur): 
    ... 
    with global_lock: 
     fd = urllib.urlopen(base) 
    with fd: 
     htmltext = fd.read() 

編輯

你可以使用像(例如)龍捲風或ASYNCIO庫選擇使用單線程(async IO)代碼。

順便說一句,通過使用每個線程的sqlite連接,您可以在相應的線程中檢索到它之後立即存儲所刮取的數據。

0

我也有這個錯誤。我只是爲每個線程添加一些睡眠時間,問題就解決了。我用time.sleep(0.1)。

相關問題