2010-10-20 107 views
2

我在做一些使用hadoop map-reduce作業的文本處理。我的工作完成了99.2%,並停留在最後的地圖工作上。Hadoop最後地圖作業卡住了 - 需要幫助

地圖輸出的最後幾行顯示如下。最後一次,當這個問題發生時,我試圖打印出從地圖上刪除的鍵值,並注意到其中一個關鍵字有大量值與它相關聯,我認爲它在排序這些值時出現卡住現象。然後,我停止從地圖作業中發現該密鑰,並且它工作正常。

我認爲,同樣的問題再次出現,打印出鍵值對是一項繁瑣的工作,因爲這項工作需要時間。有更好的選擇嗎?像配置hadoop,如果他們在分類上花費太多時間,就會忘記幾個鍵。有沒有這樣的事情。

 
2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 79698262; bufvoid = 99614720 
2010-10-20 14:43:32,274 INFO org.apache.hadoop.mapred.MapTask: kvstart = 0; kvend = 6601; length = 327680 
2010-10-20 14:43:33,272 INFO org.apache.hadoop.mapred.MapTask: Finished spill 0 
2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79698262; bufend = 59800449; bufvoid = 99614720 
2010-10-20 14:50:44,113 INFO org.apache.hadoop.mapred.MapTask: kvstart = 6601; kvend = 9039; length = 327680 
2010-10-20 14:50:44,864 INFO org.apache.hadoop.mapred.MapTask: Finished spill 1 
2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59800449; bufend = 39893455; bufvoid = 99614720 
2010-10-20 14:58:33,105 INFO org.apache.hadoop.mapred.MapTask: kvstart = 9039; kvend = 11228; length = 327680 
2010-10-20 14:58:33,817 INFO org.apache.hadoop.mapred.MapTask: Finished spill 2 
2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: bufstart = 39893455; bufend = 20000988; bufvoid = 99614720 
2010-10-20 15:06:48,675 INFO org.apache.hadoop.mapred.MapTask: kvstart = 11228; kvend = 13286; length = 327680 
2010-10-20 15:06:49,395 INFO org.apache.hadoop.mapred.MapTask: Finished spill 3 
2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: bufstart = 20000988; bufend = 78879; bufvoid = 99614720 
2010-10-20 15:15:23,514 INFO org.apache.hadoop.mapred.MapTask: kvstart = 13286; kvend = 15265; length = 327680 
2010-10-20 15:15:24,230 INFO org.apache.hadoop.mapred.MapTask: Finished spill 4 
2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: bufstart = 78879; bufend = 79807573; bufvoid = 99614720 
2010-10-20 15:24:35,797 INFO org.apache.hadoop.mapred.MapTask: kvstart = 15265; kvend = 17188; length = 327680 
2010-10-20 15:24:36,500 INFO org.apache.hadoop.mapred.MapTask: Finished spill 5 
2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79807573; bufend = 59907680; bufvoid = 99614720 
2010-10-20 15:33:33,391 INFO org.apache.hadoop.mapred.MapTask: kvstart = 17188; kvend = 19074; length = 327680 
2010-10-20 15:33:34,114 INFO org.apache.hadoop.mapred.MapTask: Finished spill 6 
2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: bufstart = 59907680; bufend = 40011208; bufvoid = 99614720 
2010-10-20 15:42:39,913 INFO org.apache.hadoop.mapred.MapTask: kvstart = 19074; kvend = 20926; length = 327680 
2010-10-20 15:42:40,597 INFO org.apache.hadoop.mapred.MapTask: Finished spill 7 
2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: bufstart = 40011208; bufend = 20111383; bufvoid = 99614720 
2010-10-20 15:51:49,668 INFO org.apache.hadoop.mapred.MapTask: kvstart = 20926; kvend = 22759; length = 327680 
2010-10-20 15:51:50,378 INFO org.apache.hadoop.mapred.MapTask: Finished spill 8 
2010-10-20 16:01:05,893 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 16:01:05,893 INFO org.apache.hadoop.mapred.MapTask: bufstart = 20111383; bufend = 196929; bufvoid = 99614720 
2010-10-20 16:01:05,894 INFO org.apache.hadoop.mapred.MapTask: kvstart = 22759; kvend = 24572; length = 327680 
2010-10-20 16:01:06,634 INFO org.apache.hadoop.mapred.MapTask: Finished spill 9 
2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: bufstart = 196929; bufend = 79900267; bufvoid = 99614720 
2010-10-20 16:10:25,000 INFO org.apache.hadoop.mapred.MapTask: kvstart = 24572; kvend = 26370; length = 327680 
2010-10-20 16:10:25,776 INFO org.apache.hadoop.mapred.MapTask: Finished spill 10 
2010-10-20 16:19:48,283 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: buffer full= true 
2010-10-20 16:19:48,283 INFO org.apache.hadoop.mapred.MapTask: bufstart = 79900267; bufend = 59993676; bufvoid = 99614720 
2010-10-20 16:19:48,284 INFO org.apache.hadoop.mapred.MapTask: kvstart = 26370; kvend = 28152; length = 327680 
2010-10-20 16:19:49,042 INFO org.apache.hadoop.mapred.MapTask: Finished spill 11 

謝謝

回答

3

沒有什麼在Hadoop中,將知道地圖的特定invokation()被髮射鍵值對的了大量資金。我猜你在map()函數中有某種循環發出這些鍵值對。如果發射超過N對,您可以簡單地將環路編碼爲短路。

另一種方法是找出某種方式對輸入值進行分區,以便映射器處理更多粒度塊,以便所有映射器執行大致相同的工作量。

我不確定你想要做什麼,所以這些建議可能不適用。希望這可以幫助。