2015-04-02 55 views
7

我已經在Hortonwork VM中安裝了RHADOOP。當我運行的MapReduce代碼來驗證它拋出一個錯誤說流式命令失敗!在RHADOOP

我使用用戶爲:rstudio(不root.but訪問sudoer)

流命令失敗!

任何人都可以幫助我理解這個問題。我沒有太多的想法解決thios問題。

Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.0.0-2041/hadoop") 

    Sys.setenv(HADOOP_CMD="/usr/bin/hadoop") 
    Sys.setenv(HADOOP_STREAMING="/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-streaming.jar") 
    library(rhdfs) 
    hdfs.init() 
    library(rmr2) 
    ints = to.dfs(1:10) 
    calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v) 

) 

我收到錯誤,下面是rhadoop錯誤

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : hadoop streaming failed with error code 1 

4 
stop("hadoop streaming failed with error code ", retval, "\n") 
3 
mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, in.folder = if (is.list(input)) { lapply(input, to.dfs.path) } else to.dfs.path(input), out.folder = to.dfs.path(output), ... 
2 
mapreduce(input = input, output = output, input.format = "text", map = map) 
1 
wordcount(hdfs.data, hdfs.out) 



packageJobJar: [] [/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-streaming-2.6.0.2.2.0.0-2041.jar] /tmp/streamjob3075733686753367992.jar tmpDir=null 
15/04/07 21:43:10 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 
15/04/07 21:43:10 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050 
15/04/07 21:43:11 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 
15/04/07 21:43:11 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050 
15/04/07 21:43:11 INFO mapred.FileInputFormat: Total input paths to process : 1 
15/04/07 21:43:11 INFO mapreduce.JobSubmitter: number of splits:2 
15/04/07 21:43:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428440418649_0006 
15/04/07 21:43:12 INFO impl.YarnClientImpl: Submitted application application_1428440418649_0006 
15/04/07 21:43:12 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1428440418649_0006/ 
15/04/07 21:43:12 INFO mapreduce.Job: Running job: job_1428440418649_0006 
15/04/07 21:43:19 INFO mapreduce.Job: Job job_1428440418649_0006 running in uber mode : false 
15/04/07 21:43:19 INFO mapreduce.Job: map 0% reduce 0% 
15/04/07 21:43:27 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000001_0, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

Container killed by the ApplicationMaster. 
Container killed on request. Exit code is 143 
Container exited with a non-zero exit code 143 

15/04/07 21:43:27 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000000_0, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

15/04/07 21:43:35 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000001_1, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

15/04/07 21:43:35 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000000_1, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

15/04/07 21:43:43 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000001_2, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

15/04/07 21:43:44 INFO mapreduce.Job: Task Id : attempt_1428440418649_0006_m_000000_2, Status : FAILED 
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535) 
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130) 
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) 
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 

15/04/07 21:43:52 INFO mapreduce.Job: map 100% reduce 0% 
15/04/07 21:43:53 INFO mapreduce.Job: Job job_1428440418649_0006 failed with state FAILED due to: Task failed task_1428440418649_0006_m_000001 
Job failed as tasks failed. failedMaps:1 failedReduces:0 

15/04/07 21:43:54 INFO mapreduce.Job: Counters: 13 
    Job Counters 
     Failed map tasks=7 
     Killed map tasks=1 
     Launched map tasks=8 
     Other local map tasks=6 
     Data-local map tasks=2 
     Total time spent by all maps in occupied slots (ms)=49670 
     Total time spent by all reduces in occupied slots (ms)=0 
     Total time spent by all map tasks (ms)=49670 
     Total vcore-seconds taken by all map tasks=49670 
     Total megabyte-seconds taken by all map tasks=12417500 
    Map-Reduce Framework 
     CPU time spent (ms)=0 
     Physical memory (bytes) snapshot=0 
     Virtual memory (bytes) snapshot=0 
15/04/07 21:43:54 ERROR streaming.StreamJob: Job not successful! 
Streaming Command Failed! 
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, : 
    hadoop streaming failed with error code 1 

回答

2

您目前使用的是使用Rstudio。您是否可以嘗試使用.R編寫代碼並運行使用hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming.jar -input file-in-hadoop -output hdfs_output_dir -file mapper_file -file reducer_file -mapper mapper.R -reducer reducer.R

順便說一句,只有當沒有指定正確的輸入/輸出路徑時纔會導致您的異常PipeMapRed.waitOutputThreads():。請檢查你的路徑。

這應該工作。

0

你的代碼工作正常,我就改變HADOOP_CMDHADOOP_STREAMING符合我的系統配置(我運行的Hadoop 2.4 .0在Ubuntu 14.04上)。

我的建議是:

  • 確保的Hadoop功能性實例在運行,即,在終端上的命令jps應顯示以下輸出:

enter image description here

  • 確保rJava庫在加載庫(rhdfs)時被加載。
  • 確保您指的是正確的流式jar文件。

下面是R碼和輸出:

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop") 
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar") 

library(rhdfs) 
# Loading required package: rJava 
# HADOOP_CMD=/usr/local/hadoop/bin/hadoop 
# Be sure to run hdfs.init() 

hdfs.init() 
library(rmr2) 
ints = to.dfs(1:10) 
calc = mapreduce(input = ints, map = function(k, v) cbind(v, 2*v)) 

輸出:

15/04/07 05:18:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/04/07 05:18:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 
packageJobJar: [/usr/local/hadoop/data/hadoop-unjar1328285833881826794/] [] /tmp/ streamjob6167004817219806828.jar tmpDir=null 
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050 
15/04/07 05:18:47 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8050 
15/04/07 05:18:48 INFO mapred.FileInputFormat: Total input paths to process : 1 
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: number of splits:2 
15/04/07 05:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428363713092_0002 
15/04/07 05:18:49 INFO impl.YarnClientImpl: Submitted application application_1428363713092_0002 
15/04/07 05:18:50 INFO mapreduce.Job: The url to track the job: http://manohar-dt:8088/proxy/application_1428363713092_0002/ 
15/04/07 05:18:50 INFO mapreduce.Job: Running job: job_1428363713092_0002 
15/04/07 05:19:00 INFO mapreduce.Job: Job job_1428363713092_0002 running in uber mode : false 
15/04/07 05:19:00 INFO mapreduce.Job: map 0% reduce 0% 
15/04/07 05:19:15 INFO mapreduce.Job: map 50% reduce 0% 
15/04/07 05:19:16 INFO mapreduce.Job: map 100% reduce 0% 
15/04/07 05:19:17 INFO mapreduce.Job: Job job_1428363713092_0002 completed successfully 
15/04/07 05:19:17 INFO mapreduce.Job: Counters: 30 
    File System Counters 
     FILE: Number of bytes read=0 
     FILE: Number of bytes written=194356 
     FILE: Number of read operations=0 
     FILE: Number of large read operations=0 
     FILE: Number of write operations=0 
     HDFS: Number of bytes read=979 
     HDFS: Number of bytes written=919 
     HDFS: Number of read operations=14 
     HDFS: Number of large read operations=0 
     HDFS: Number of write operations=4 
    Job Counters 
     Launched map tasks=2 
     Data-local map tasks=2 
    Total time spent by all maps in occupied slots (ms)=25803 
    Total time spent by all reduces in occupied slots (ms)=0 
    Total time spent by all map tasks (ms)=25803 
    Total vcore-seconds taken by all map tasks=25803 
    Total megabyte-seconds taken by all map tasks=26422272 
    Map-Reduce Framework 
    Map input records=3 
    Map output records=3 
    Input split bytes=186 
    Spilled Records=0 
    Failed Shuffles=0 
    Merged Map outputs=0 
    GC time elapsed (ms)=293 
    CPU time spent (ms)=3640 
    Physical memory (bytes) snapshot=322818048 
    Virtual memory (bytes) snapshot=2107604992 
    Total committed heap usage (bytes)=223346688 
    File Input Format Counters 
    Bytes Read=793 
    File Output Format Counters 
     Bytes Written=919 
15/04/07 05:19:17 INFO streaming.StreamJob: Output directory: /tmp/file11d247219866 

希望這有助於。

+0

HI Manohar .....即使我寫了同樣的東西..問題是無法運行..這就是我無法弄清楚....我已經嘗試了很多組合來解決這個問題...我知道有沒有問題的代碼..這個答案不幫助我無論如何 – Aman 2015-04-07 11:34:01

+0

我使用hortonwork,我認爲hadoop_cmd和hadoop_streaming的路徑是正確的..我沒有看到除此之外的任何其他問題... – Aman 2015-04-07 11:37:38

+0

嗨阿曼,是否有可能粘貼錯誤輸出的全文? – 2015-04-07 17:09:55