2011-11-03 78 views
4

我越來越不同的複製被阻止的報告。我想知道是什麼原因造成的。 hadoop dfsadmin -metasave報告〜232,000 MISSING塊正在等待複製。我該如何解決?喬布斯運行得很好,似乎沒有數據丟失。未完全複製的塊數不準確,但爲什麼?

請參閱從hadoop fsck /hadoop dfsadmin -reporthadoop dfsadmin -metasave輸出,和下面的NameNode的網頁圖形用戶界面:

hadoop fsck /

Total size: 6066860793495 B (Total open files size: 47000701003 B) 
Total dirs: 1801 
Total files: 230828 (Files currently being written: 493) 
Total blocks (validated):  242592 (avg. block size 25008494 B) (Total open file blocks (not validated): 681) 
Minimally replicated blocks: 242592 (100.0 %) 
Over-replicated blocks:  0 (0.0 %) 
Under-replicated blocks:  932 (0.38418415 %) 
Mis-replicated blocks:   0 (0.0 %) 
Default replication factor: 3 
Average block replication:  2.9945753 
Corrupt blocks:    0 
Missing replicas:    1851 (0.25479725 %) 
Number of data-nodes:   20 
Number of racks:    1 
FSCK ended at Thu Nov 03 10:17:47 CDT 2011 in 7359 milliseconds 

hadoop dfsadmin -report

Configured Capacity: 59070545264640 (53.72 TB) 
Present Capacity: 56867905841329 (51.72 TB) 
DFS Remaining: 37637696475136 (34.23 TB) 
DFS Used: 19230209366193 (17.49 TB) 
DFS Used%: 33.82% 
Under replicated blocks: 245346 
Blocks with corrupt replicas: 73 
Missing blocks: 0 

metasave輸出... 的Hadoop dfsadmin -metasave輸出摘錄:

232461 files and directories, 243290 blocks = 475751 total 
Live Datanodes: 20 
Dead Datanodes: 0 
Metasave: Blocks waiting for replication: 242747 

有被複制的約1000實際文件(或等待),然後〜232000的文件 「失蹤」 的所有類似:

: blk_2551072940280567829_12480437 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_2565249812869117144_12480431 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_2950011510944289339_12480413 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3809337797233614456_12456357 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3809337797233614456_12463021 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3809337797233614456_12468869 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3809337797233614456_12474511 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3811560762593023914_12440928 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3811560762593023914_12449396 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3811560762593023914_12462184 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3811560762593023914_12465792 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3811560762593023914_12472905 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3812070171484751861_12436051 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
: blk_3815454413870879906_12441243 MISSING (replicas: l: 0 d: 0 c: 0 e: 0) 
Metasave: Blocks being replicated: 0 
Metasave: Blocks 29 waiting deletion from 17 datanodes. 

的Namenode網頁GUI:

Cluster Summary 
232390 files and directories, 243235 blocks = 475625 total. Heap Size is 1.84 GB/8.68 GB (21%) 
Configured Capacity : 53.72 TB 
DFS Used : 17.46 TB 
Non DFS Used : 2 TB 
DFS Remaining : 34.26 TB 
DFS Used% : 32.51 % 
DFS Remaining% : 63.77 % 
Live Nodes : 20 
Dead Nodes : 0 
Decommissioning Nodes : 0 
Number of Under-Replicated Blocks : 242532 

! !更新:!!

我相信這一定是一個錯誤,因爲「未完全複製」塊的數量現在接近100萬。我們沒有接近羣集上實際塊的數量,所以這肯定是個bug。

Web GUI中現在示出了以下內容:

Cluster Summary 
234877 files and directories, 250074 blocks = 484951 total. Heap Size is 706.5 MB/8.68 GB (7%) 
Configured Capacity : 53.72 TB 
DFS Used : 20.71 TB 
Non DFS Used : 1.54 TB 
DFS Remaining : 31.47 TB 
DFS Used% : 38.56 % 
DFS Remaining% : 58.58 % 
Live Nodes : 20 
Dead Nodes : 0 
Decommissioning Nodes : 0 
Number of Under-Replicated Blocks : 451014 

回答

7

我獲得從Cloudera的託德Lipcon的響應。如果其他人有此問題,我想更新此問題。我注意到這個問題與CDH3u1,這是響應:

「 的‘追加’功能已知在CDH3被打破,並可能 有這樣的錯誤,我們建議您建議用戶不要 在Hadoop 0.20.x的所有版本中都是如此(CDH和 ,否則),並將在CDH4(上游版本0.23或更高版本)中修復此問題。特定bug使 確定它不存在於上游中繼線中,但在CDH3版本中不太可能修復爲 「

相關問題