我已經成功地調試了自己的內存泄漏問題。但是,我注意到一些非常奇怪的事件。Python內存泄漏 - 解決了,但仍然困惑
for fid, fv in freqDic.iteritems():
outf.write(fid+"\t") #ID
for i, term in enumerate(domain): #Vector
tfidf = self.tf(term, fv) * self.idf(term, docFreqDic)
if i == len(domain) - 1:
outf.write("%f\n" % tfidf)
else:
outf.write("%f\t" % tfidf)
outf.flush()
print "Memory increased by", int(self.memory_mon.usage()) - startMemory
outf.close()
def tf(self, term, freqVector):
total = freqVector[TOTAL]
if total == 0:
return 0
if term not in freqVector: ## When you don't have these lines memory leaks occurs
return 0 ##
return float(freqVector[term])/freqVector[TOTAL]
def idf(self, term, docFrequencyPerTerm):
if term not in docFrequencyPerTerm:
return 0
return math.log(float(docFrequencyPerTerm[TOTAL])/docFrequencyPerTerm[term])
基本上讓我描述我的問題: 1)我做TFIDF計算 2)我跟蹤內存泄漏的根源是從defaultdict到來。我使用memory_mon從How to get current CPU and RAM usage in Python? 4)我的內存泄漏的原因如下:a)在self.tf中,如果行:if項不在freqVector:return 0中未添加會導致內存泄漏。 (我使用memory_mon驗證了這一點,並注意到內存的急劇增加不斷增加)
我的問題的解決方案是1)由於fv是defaultdict,所以在fv中找不到它的任何引用都會創建條目。在非常大的域中,這會導致內存泄漏。
我決定使用dict而不是默認的dict,並且內存問題確實消失了。我的唯一難題是:因爲fv是在fid中創建的,所以在freqDic.iteritems()中使用fv:「不應該在每個for循環的末尾被銷燬?我試着把gc.collect()放在for循環的末尾,但gc不能收集所有東西(返回0)。是的,這個假設是正確的,但是如果for循環會破壞所有的臨時變量,那麼內存應該保持與循環相當一致。
這是它看起來像在self.tf兩個行:
Memory increased by 12
Memory increased by 948
Memory increased by 28
Memory increased by 36
Memory increased by 36
Memory increased by 32
Memory increased by 28
Memory increased by 32
Memory increased by 32
Memory increased by 32
Memory increased by 40
Memory increased by 32
Memory increased by 32
Memory increased by 28
,並沒有兩行:
Memory increased by 1652
Memory increased by 3576
Memory increased by 4220
Memory increased by 5760
Memory increased by 7296
Memory increased by 8840
Memory increased by 10456
Memory increased by 12824
Memory increased by 13460
Memory increased by 15000
Memory increased by 17448
Memory increased by 18084
Memory increased by 19628
Memory increased by 22080
Memory increased by 22708
Memory increased by 24248
Memory increased by 26704
Memory increased by 27332
Memory increased by 28864
Memory increased by 30404
Memory increased by 32856
Memory increased by 33552
Memory increased by 35024
Memory increased by 36564
Memory increased by 39016
Memory increased by 39924
Memory increased by 42104
Memory increased by 42724
Memory increased by 44268
Memory increased by 46720
Memory increased by 47352
Memory increased by 48952
Memory increased by 50428
Memory increased by 51964
Memory increased by 53508
Memory increased by 55960
Memory increased by 56584
Memory increased by 58404
Memory increased by 59668
Memory increased by 61208
Memory increased by 62744
Memory increased by 64400
我期待着你的答案
編輯: 看來,我的術語可能是錯誤的(或似乎是錯誤的)。
- 我指的內存泄漏不是從freqVector [term]生成的。 (在defaultdict中查找不存在的鍵)。
- 我在說的實際內存泄漏是從
for fid, fv in freqDic.iteritems()
內存泄漏!我知道由於1)fv的尺寸增加了,但在循環結束時它仍然應該被銷燬!內存不應該繼續擴大。這不是內存泄漏?
謝謝。這就說得通了。 – disappearedng 2010-04-06 15:14:13