我正在4個節點(3個從機)上建立一個Hadoop集羣,VPC內的所有獨立EC2。大致步驟如下(但安裝Hadoop的2.8.1代替):http://arturmkrtchyan.com/how-to-setup-multi-node-hadoop-2-yarn-clusterHDFS沒有格式化,但沒有錯誤
我格式化名稱節點,這給了以下回應:
$ hdfs namenode -format
17/09/26 07:05:34 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: user = hduser
STARTUP_MSG: host = ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com/10.0.0.190
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.8.1
STARTUP_MSG: classpath = /usr/...
STARTUP_MSG: build = Unknown -r Unknown; compiled by 'hduser' on 2017-09-22T14:53Z
STARTUP_MSG: java = 1.8.0_144
************************************************************/
17/09/26 07:07:33 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
17/09/26 07:07:33 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-15524170-7dfa-481b-add9-4c2542a55ca5
17/09/26 07:07:33 INFO namenode.FSEditLog: Edit logging is async:false
17/09/26 07:07:33 INFO namenode.FSNamesystem: KeyProvider: null
17/09/26 07:07:33 INFO namenode.FSNamesystem: fsLock is fair: true
17/09/26 07:07:33 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
17/09/26 07:07:33 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
17/09/26 07:07:33 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=false
17/09/26 07:07:33 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
17/09/26 07:07:33 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Sep 26 07:07:33
17/09/26 07:07:33 INFO util.GSet: Computing capacity for map BlocksMap
17/09/26 07:07:33 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:33 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
17/09/26 07:07:33 INFO util.GSet: capacity = 2^21 = 2097152 entries
17/09/26 07:07:33 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
17/09/26 07:07:33 INFO blockmanagement.BlockManager: defaultReplication = 3
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxReplication = 512
17/09/26 07:07:33 INFO blockmanagement.BlockManager: minReplication = 1
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
17/09/26 07:07:33 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
17/09/26 07:07:33 INFO blockmanagement.BlockManager: encryptDataTransfer = false
17/09/26 07:07:33 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
17/09/26 07:07:33 INFO namenode.FSNamesystem: fsOwner = hduser (auth:SIMPLE)
17/09/26 07:07:33 INFO namenode.FSNamesystem: supergroup = supergroup
17/09/26 07:07:33 INFO namenode.FSNamesystem: isPermissionEnabled = false
17/09/26 07:07:33 INFO namenode.FSNamesystem: HA Enabled: false
17/09/26 07:07:33 INFO namenode.FSNamesystem: Append Enabled: true
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map INodeMap
17/09/26 07:07:34 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:34 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
17/09/26 07:07:34 INFO util.GSet: capacity = 2^20 = 1048576 entries
17/09/26 07:07:34 INFO namenode.FSDirectory: ACLs enabled? false
17/09/26 07:07:34 INFO namenode.FSDirectory: XAttrs enabled? true
17/09/26 07:07:34 INFO namenode.NameNode: Caching file names occurring more than 10 times
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map cachedBlocks
17/09/26 07:07:34 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:34 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
17/09/26 07:07:34 INFO util.GSet: capacity = 2^18 = 262144 entries
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/09/26 07:07:34 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
17/09/26 07:07:34 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
17/09/26 07:07:34 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/09/26 07:07:34 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/09/26 07:07:34 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/09/26 07:07:34 INFO util.GSet: VM type = 64-bit
17/09/26 07:07:34 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
17/09/26 07:07:34 INFO util.GSet: capacity = 2^15 = 32768 entries
Re-format filesystem in Storage Directory /usr/local/hadoop/data/namenode ? (Y or N)
$ Y
17/09/26 07:09:21 INFO namenode.FSImage: Allocated new BlockPoolId: BP-793961451-10.0.0.190-1506409761821
17/09/26 07:09:21 INFO common.Storage: Storage directory /usr/local/hadoop/data/namenode has been successfully formatted.
17/09/26 07:09:21 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/data/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
17/09/26 07:09:21 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/data/namenode/current/fsimage.ckpt_0000000000000000000 of size 323 bytes saved in 0 seconds.
17/09/26 07:09:21 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/09/26 07:09:21 INFO util.ExitUtil: Exiting with status 0
17/09/26 07:09:21 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com/10.0.0.190
************************************************************/
當我啓動DFS和紗線它似乎正確啓動:
$ start-dfs.sh
Starting namenodes on [ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com]
ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com: starting namenode, logging to ...
10.0.0.185: starting datanode, logging to ...
10.0.0.244: starting datanode, logging to ...
10.0.0.83: starting datanode, logging to ...
Starting secondary namenodes [ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com]
ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com: starting secondarynamenode, logging to ...
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to ...
10.0.0.185: starting nodemanager, logging to ...
10.0.0.83: starting nodemanager, logging to ...
10.0.0.244: starting nodemanager, logging to ...
$ jps
14326 NameNode
14998 Jps
14552 SecondaryNameNode
14729 ResourceManager
而且對其他節點是這樣的:
15880 Jps
15563 DataNode
15693 NodeManager
但是,當我嘗試將數據寫入HDFS時,它告訴我沒有任何節點實際可用。這似乎是一個非常普遍的錯誤,我無法找到問題所在。
$ hdfs dfs -put pg1661.txt /samples/input
WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /samples/input/pg1661.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
然後,當我檢查狀態,它似乎並沒有正常工作:
$ hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: NaN%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
我檢查日誌文件,而且它們並不表示任何(致命)的錯誤,除了當試圖上傳文件。
鑑於上述情況在啓動時不會產生任何錯誤,並且錯誤消息本身非常普遍,我發現很難找到錯誤。
THX您的回覆。我確實運行了這個命令。響應以'SHUTDOWN_MSG:關閉NameNode在ec2-xx-xx-xx-01.eu-central-1.compute.amazonaws.com//10.0.0.190'結束。這是否表明format命令失敗?它不會給出任何錯誤消息,除非告訴我它關閉了。我會更新這個問題。 – Dendrobates
我包含了我(嘗試)格式化名稱節點時得到的響應。 – Dendrobates
我認爲格式化namenode時,關閉消息是正常的。我想可能是namenode無法SSH進入數據節點。你有沒有將數據節點定義爲單獨的服務器或同一臺服務器?也許你可以先嚐試單節點設置,即同一臺服務器上的namenode和數據節點。一旦工作,嘗試添加其他數據節點。它將隔離一些問題。你也可以與core-site.xml,hdfs-site.xml一起共享你的主人和奴隸文件嗎? –