我試圖使用多線程從雅虎財務刮取股票數據並將其保存到SQL。但是,我得到了以下錯誤:IOError:[Errno套接字錯誤] [Errno 8]節點名稱或服務名稱提供,或不知道
*Exception in thread Thread-3091:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "todatabase.py", line 19, in th
htmltext = urllib.urlopen(base).read()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 87, in urlopen
return opener.open(url)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open
return getattr(self, name)(url)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 350, in open_http
h.endheaders(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 1049, in endheaders
self._send_output(message_body)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 893, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 855, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 832, in connect
self.timeout, self.source_address)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py", line 557, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known*
這裏是我的代碼:
from threading import Thread
import sqlite3
import urllib
import re
conn = sqlite3.connect('stock.sqlite')
cur = conn.cursor()
cur.execute('''CREATE TABLE IF NOT EXISTS Stock
(symbol TEXT UNIQUE PRIMARY KEY, price NUMERIC) ''')
dic = {}
def th(ur):
base = "http://finance.yahoo.com/q?s=" + ur
regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
pattern = re.compile(regex)
htmltext = urllib.urlopen(base).read()
results = re.findall(pattern, htmltext)
try:
dic[ur] = results[0]
except:
print 'got a error!'
symbolslist = open("symbols.txt").read()
symbolslist = symbolslist.split("\n")
threadlist = []
for u in symbolslist:
t = Thread(target = th, args = (u,))
t.start()
threadlist.append(t)
for b in threadlist:
b.join()
for key, value in dic.items():
print key, value
cur.execute('INSERT INTO Stock(symbol,price) VALUES (?,?)',(key,value))
conn.commit()
cur.close()
我認爲錯誤也許在多線程的部分,因爲我可以得到的數據,而無需使用多線程,但在低速。
多線程和這個錯誤,我只是在最後得到200+(符號,價格),而不是3145
我試圖改變DNS和IP,並不能解決問題。
不要使用正則表達式來解析html,還有一個你可以訪問的yahoo api,它會給你json –