2011-02-11 62 views
26

我有一個包含18個數據節點的Hadoop集羣。 兩小時前我重新啓動了名稱節點,名稱節點仍處於安全模式。Hadoop安全模式恢復 - 花費太長時間!

我一直在尋找爲什麼這可能需要很長時間,我找不到一個好的答案。 這裏發帖: Hadoop safemode recovery - taking lot of time 是相關的,但我不知道如果我想/需要進行更改此設置爲這篇文章後,重新啓動名稱節點提到:

<property> 
<name>dfs.namenode.handler.count</name> 
<value>3</value> 
<final>true</final> 
</property> 

在任何情況下,本就是我在 'Hadoop的Hadoop的NameNode的-Hadoop的名稱node.log' 已經越來越:

2011-02-11 01:39:55,226 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020, call delete(/tmp/hadoop-hadoop/mapred/system, true) from 10.1.206.27:54864: error: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-hadoop/mapred/system. Name node is in safe mode. 
The reported blocks 319128 needs additional 7183 blocks to reach the threshold 0.9990 of total blocks 326638. Safe mode will be turned off automatically. 
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-hadoop/mapred/system. Name node is in safe mode. 
The reported blocks 319128 needs additional 7183 blocks to reach the threshold 0.9990 of total blocks 326638. Safe mode will be turned off automatically. 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1711) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1691) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:565) 
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:616) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:416) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960) 

任何建議表示讚賞。 謝謝!

+0

你的複製因素是什麼? – 2011-02-11 08:58:29

+0

複製因子是3.它仍然處於安全模式! – 2011-02-11 11:58:56

回答

43

我曾經有過一次,其中一些塊從未報告過。我不得不強制讓namenode離開安全模式(hadoop dfsadmin -safemode leave),然後運行fsck刪除丟失的文件。