2017-05-05 46 views
0

我在使用emr-5.4.0解決如何執行一個簡單的MapReduce作業並輸入來自HTable的問題時遇到了一些問題。 (emr-5.3.0也失敗)來自AWS上的HTable的MapReduce輸入超時

我已經做了一堆谷歌搜索,以找出如何進行,但could'nt找到任何有用的東西。

我的過程:

  1. 我創建使用Hbase.The版本一個EMR簇是:

Amazon 2.7.3, Ganglia 3.7.2, HBase 1.3.0, Hive 2.1.1, Hue 3.11.0, Phoenix 4.9.0

  • 根據來自所述樣品手冊:http://hbase.apache.org/book.html#mapreduce.example,寫我的工作喜歡:

    public class TableMapTest3 {

    // TableMapper 
    public static class MyMapper extends TableMapper<Text, Text> { 
    
        protected void map(ImmutableBytesWritable key, Result inputValue, Context context) 
          throws IOException, InterruptedException { 
         String keyS = new String(key.get(), "UTF-8"); 
         String value = new String(inputValue.getValue(Bytes.toBytes("contents"), Bytes.toBytes("name")), "UTF-8"); 
         System.out.println("TokenizerMapper :" + value); 
         context.write(new Text(keyS), new Text(value)); 
        } 
    } 
    
    public static void main(String[] args) throws Exception { 
        Configuration conf = HBaseConfiguration.create(); 
        System.out.println("url:" + conf.get("fs.defaultFS")); 
        System.out.println("hbase.zookeeper.quorum:" + conf.get("hbase.zookeeper.quorum")); 
        Connection conn = ConnectionFactory.createConnection(conf); 
    
        Admin admin = conn.getAdmin(); 
        String tableName = "TableMapTest"; 
        TableName tablename = TableName.valueOf(tableName); 
    
        Table hTable = null; 
        // check table exists 
        if (admin.tableExists(tablename)) { 
         System.out.println(tablename + " table existed..."); 
         hTable = conn.getTable(tablename); 
         ResultScanner resultScanner = hTable.getScanner(new Scan()); 
         for (Result result : resultScanner) { 
          Delete delete = new Delete(result.getRow()); 
          hTable.delete(delete); 
         } 
        } else { 
         HTableDescriptor tableDesc = new HTableDescriptor(tablename); 
         tableDesc.addFamily(new HColumnDescriptor("contents")); 
         admin.createTable(tableDesc); 
         System.out.println(tablename + " table created..."); 
         hTable = conn.getTable(tablename); 
        } 
    
        // insert data 
        for (int i = 0; i < 20; i++) { 
         Put put = new Put(Bytes.toBytes(String.valueOf(i))); 
         put.addColumn(Bytes.toBytes("contents"), Bytes.toBytes("name"), Bytes.toBytes("value" + i)); 
         hTable.put(put); 
        } 
        hTable.close(); 
    
        // Hadoop 
        Job job = Job.getInstance(conf, TableMapTest3.class.getSimpleName()); 
        job.setJarByClass(TableMapTest3.class); 
        job.setOutputFormatClass(NullOutputFormat.class); 
    
        Scan scan = new Scan(); 
        TableMapReduceUtil.initTableMapperJob(tableName, scan, MyMapper.class, Text.class, Text.class, job); 
    
        System.out.println("TableMapTest result:" + job.waitForCompletion(true)); 
    } 
    

    }

  • 包我的源罐子並把它上傳到集羣。然後我ssh在主服務器和運行我的工作:

    hadoop jar zz-0.0.1.jar com.ziki.zz.TableMapTest3

  • 我得到了如下信息:

    url:hdfs://ip-xxx.ap-northeast-1.compute.internal:8020 
    hbase.zookeeper.quorum:localhost 
    TableMapTest table created... 
    17/05/05 01:31:23 INFO impl.TimelineClientImpl: Timeline service address: http://ip-xxx.ap-northeast-1.compute.internal:8188/ws/v1/timeline/ 
    17/05/05 01:31:23 INFO client.RMProxy: Connecting to ResourceManager at ip-xxx.ap-northeast-1.compute.internal/172.31.4.228:8032 
    17/05/05 01:31:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 
    17/05/05 01:31:31 INFO mapreduce.JobSubmitter: number of splits:1 
    17/05/05 01:31:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1493947058255_0001 
    17/05/05 01:31:33 INFO impl.YarnClientImpl: Submitted application application_1493947058255_0001 
    17/05/05 01:31:34 INFO mapreduce.Job: The url to track the job: http://ip-xxx.ap-northeast-1.compute.internal:20888/proxy/application_1493947058255_0001/ 
    17/05/05 01:31:34 INFO mapreduce.Job: Running job: job_1493947058255_0001 
    17/05/05 01:31:57 INFO mapreduce.Job: Job job_1493947058255_0001 running in uber mode : false 
    17/05/05 01:31:57 INFO mapreduce.Job: map 0% reduce 0% 
    
    經過了精心

    ,我得到的錯誤:

    17/05/05 01:42:26 INFO mapreduce.Job: Task Id : attempt_1493947058255_0001_m_000000_0, Status : FAILED 
    AttemptID:attempt_1493947058255_0001_m_000000_0 Timed out after 600 secs 
    Container killed by the ApplicationMaster. 
    Container killed on request. Exit code is 143 
    Container exited with a non-zero exit code 143 
    
    17/05/05 01:52:56 INFO mapreduce.Job: Task Id : attempt_1493947058255_0001_m_000000_1, Status : FAILED 
    AttemptID:attempt_1493947058255_0001_m_000000_1 Timed out after 600 secs 
    Container killed by the ApplicationMaster. 
    Container killed on request. Exit code is 143 
    Container exited with a non-zero exit code 143 
    

    和一些系統日誌

    2017-05-05 01:31:59,664 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1493947058255_0001_m_000000 Task Transitioned from SCHEDULED to RUNNING 
    2017-05-05 01:32:08,168 INFO [Socket Reader #1 for port 33348] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1493947058255_0001 (auth:SIMPLE) 
    2017-05-05 01:32:08,227 INFO [IPC Server handler 0 on 33348] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1493947058255_0001_m_000002 asked for a task 
    2017-05-05 01:32:08,231 INFO [IPC Server handler 0 on 33348] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1493947058255_0001_m_000002 given task: attempt_1493947058255_0001_m_000000_0 
    2017-05-05 01:42:25,382 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1493947058255_0001_m_000000_0: AttemptID:attempt_1493947058255_0001_m_000000_0 Timed out after 600 secs 
    2017-05-05 01:42:25,389 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP 
    2017-05-05 01:42:25,392 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1493947058255_0001_01_000002 taskAttempt attempt_1493947058255_0001_m_000000_0 
    2017-05-05 01:42:25,392 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1493947058255_0001_m_000000_0 
    2017-05-05 01:42:25,394 INFO [ContainerLauncher #1] org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: Opening proxy : ip-xxx.ap-northeast-1.compute.internal:8041 
    2017-05-05 01:42:25,457 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP 
    2017-05-05 01:42:25,458 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: TASK_ABORT 
    2017-05-05 01:42:25,460 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_0 TaskAttempt Transitioned from FAIL_TASK_CLEANUP to FAILED 
    2017-05-05 01:42:25,495 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.util.RackResolver: Resolved ip-xxx.ap-northeast-1.compute.internal to /default-rack 
    2017-05-05 01:42:25,500 INFO [Thread-83] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node ip-xxx.ap-northeast-1.compute.internal 
    2017-05-05 01:42:25,502 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1493947058255_0001_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED 
    2017-05-05 01:42:25,503 INFO [Thread-83] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1493947058255_0001_m_000000_1 to list of failed maps 
    2017-05-05 01:42:25,557 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:3 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:1 RackLocal:0 
    2017-05-05 01:42:25,582 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1493947058255_0001: ask=1 release= 0 newContainers=0 finishedContainers=1 resourcelimit=<memory:1024, vCores:1> knownNMs=2 
    2017-05-05 01:42:25,582 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1493947058255_0001_01_000002 
    2017-05-05 01:42:25,583 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1493947058255_0001_m_000000_0: Container killed by the ApplicationMaster. 
    Container killed on request. Exit code is 143 
    Container exited with a non-zero exit code 143 
    

    我只是使用默認設置並運行一個簡單的job.why這些錯誤發生? 如果我失去了什麼,讓我知道!無論如何,謝謝你的幫助 - 欣賞它!

    +0

    你嘗試[亞馬遜故障排除(http://docs.aws.amazon.com/emr/最新/ ManagementGuide/emr-troubleshoot-errors-io.html)之前關於這個問題? –

    +0

    yes.but但我在故障排除中沒有發現任何類似的錯誤。而簡單的wordcount工作可以運行得非常好。在集羣上,「hbase shell」也沒有任何問題......我只是不知道問題出在哪裏? –

    回答

    0

    ,我發現,答案:here

    你不能使用HConfiguration(因爲它默認爲本地主機法定人數)什麼你需要做的是使用亞馬遜爲您設置的配置(位於/ etc /hbase/conf/hbase-site.xml)

    連接代碼如下所示:

    Configuration conf = new Configuration(); 
        String hbaseSite = "/etc/hbase/conf/hbase-site.xml"; 
        conf.addResource(new File(hbaseSite).toURI().toURL()); 
        HBaseAdmin.checkHBaseAvailable(conf); 
    
    相關問題