我想在Python中使用多線程和隊列(限制線程數)來遍歷字典中的字典(模擬目錄或網站的結構)中的字典。我創建了mainDict來模擬這個在Python中使用多線程迭代字典
mainDict = {"Layer1": {"Layer11": 1, "Layer12": 1, "Layer13": 1, "Layer14": 1, "Layer15": 1," Layer16": 1},
"Layer2": {"Layer21": 2, "Layer22": 2, "Layer23": 2, "Layer24": 2, "Layer25": 2, "Layer26": 2},
"Layer3": {"Layer31": 4, "Layer32": 4, "Layer33": 4, "Layer34": 4, "Layer35": 4, "Layer36": 4},
"Layer4": {"Layer41": 8, "Layer42": 8, "Layer43": 8, "Layer44": 8, "Layer45": 8, "Layer46": 8},
"Layer5": {"Layer51": 16, "Layer52": 16, "Layer53": 16, "Layer54": 16, "Layer55": 16, "Layer56": 16},
"Layer6": {"Layer61": 32, "Layer62": 32, "Layer63": 32, "Layer64": 32, "Layer65": 32, "Layer66": 32}}
和一個Crawler類爲mainDict的每個第一個子字典實例化一個爬蟲。
這個想法是,我想創建2個線程(一次有限數量的線程/爬網程序以減少CPU使用量),它可以抓取到Layer(i)(i = 1..6)。每個線程都會抓取到「樹」的葉子,而不是移動到下一個字典(例如,爬蟲0將通過第1層,爬蟲1將通過第2層,完成第3層......之後)。
class Crawler:
def __init__(self, rootDict, number_of_delay, crawler):
self.crawler = crawler
self.rootDict = rootDict
self.number_of_delay = number_of_delay
def crawlAllLeaves(self, myDict):
for k, v in myDict.items():
if isinstance(v, dict):
print("Crawler {} is crawling {}".format(self.crawler, k))
self.crawlAllLeaves(v)
else:
print("Crawler {} reached the value {} for key {}".format(self.crawler, v, k))
time.sleep(self.number_of_delay + v)
def someAuxFunc():
#to simulate some loading time
time.sleep(2)
def createWorker(q, delayNumber, crawler):
tc = Crawler(mainDict[q.get()], delayNumber, crawler)
tc.crawlAllLeaves(tc.rootDict)
def threader(q, delayNumber, crawler):
while True:
print("crawler {}: has gotten the url {}".format(crawler, q.get()))
createWorker(q, delayNumber, crawler)
print("crawler {}: has finished the url {}".format(crawler, q.get()))
q.task_done()
q = Queue()
number_of_threads = 2
delayNumber = 2
for thread in range(number_of_threads):
th = threading.Thread(target=threader, args=(q, delayNumber, thread,))
th.setDaemon(True)
th.start()
for key, value in mainDict.items():
someAuxFunc()
print("QUEING {}".format(key))
q.put(key)
q.join()
我有2個問題:
- 它創建只有2個線程,並得到了隊列的第2個元素(分字典),那麼它不會做任何事情,甚至不退出;它保持忌用
- 在穿線()函數,它說,它會得到一個子詞典,但遍歷一個不同,通過打印在crawlAllLeaves()看到
你能幫我這一個,因爲我想學習Python和線程,我不知道我在做什麼錯誤?