2015-02-05 41 views
1

加載數據,我發起了在Amazon EC2上2個m1.medium節點執行我的豬劇本,但看起來它未能在第一線(甚至MapReduce的啓動前):raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000' USING TextLoader as (line:chararray);無法從S3

錯誤消息我得到:

2015-02-04 02:15:39,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 
2015-02-04 02:15:39,821 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null 
2015-02-04 02:15:39,822 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 20 
... (omitted) 
2015-02-04 02:18:40,955 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502040202_0002 has failed! Stop running all dependent jobs 
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space 
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 
2015-02-04 02:18:40,997 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.11.1.1-amzn hadoop 2015-02-04 02:15:32 2015-02-04 02:18:40 GROUP_BY 

Failed! 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_201502050202_0002 ngroup,raw,triples,tt GROUP_BY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201502050202_0002_m_000022 

Input(s): 
Failed to read data from "s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000" 

Output(s): 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

我認爲代碼應該是罰款,因爲我曾經成功加載其他數據與相同的語法,並鏈接s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000看起來有效。我懷疑這可能與我的一些EC2設置有關,但不確定如何進一步調查或縮小問題範圍。任何人都有線索?

回答

2

「Java堆空間」錯誤消息給出了一些線索。你的文件似乎很大(〜2GB)。確保每個任務運行者都有足夠的內存來讀取數據。

+0

是的......看起來這是根本原因,看起來像m1.medium是不夠的這個數據集。謝謝你對此有所瞭解! – 2015-02-06 03:17:04

2

的問題目前正在由m1.medium改變我的節點m3.large,感謝來自@Nat良好的提示,他指出了Java堆空間有關錯誤的信息解決。稍後我會更新更多細節。