當編寫hadoop streaming任務時，hadoop streaming不解壓檔案文件

。我使用-archives從本地機器上傳tgz到hdfs任務工作目錄，但是它並沒有像文檔所說的那樣被解僱。我搜索了很多，沒有任何運氣。當編寫hadoop streaming任務時，hadoop streaming不解壓檔案文件

這裏是Hadoop的2.5.2 Hadoop的數據流任務開始的命令，很簡單

hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.2.jar \ 
    -files mapper.sh 
    -archives /home/hadoop/tmp/test.tgz#test \ 
    -D mapreduce.job.maps=1 \ 
    -D mapreduce.job.reduces=1 \ 
    -input "/test/test.txt" \ 
    -output "/res/" \ 
    -mapper "sh mapper.sh" \ 
    -reducer "cat"

和「mapper.sh」

cat > /dev/null 
ls -l test 
exit 0

在「test.tgz」

在兩個文件「test.1.txt」和「test.2.txt」

echo "abcd" > test.1.txt 
echo "efgh" > test.2.txt 
tar zcvf test.tgz test.1.txt test.2.txt

從上述任務

輸出

lrwxrwxrwx 1 hadoop hadoop  71 Feb 8 23:25 test -> /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/116/test.tgz

但什麼希望的可能是這樣

-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.1.txt 
-rw-r--r-- 1 hadoop hadoop 5 Feb 8 23:25 test.2.txt

那麼，爲什麼test.tgz一直沒有自動未解壓的document說，是有任何其他方式使得「TGZ」是未解壓

任何幫助，請，謝謝

來源

2015-02-08 Tios

任何幫助，請 – Tios 2015-02-10 03:46:04

我的錯。在向hadoop.apache.org提交問題後。我被告知hadoop實際上已經解開了test.tgz。

儘管名稱仍然是test.tgz，但它是一個未經過解密的搜索引擎。因此，這些文件可以像「cat test/test.1.txt」一樣被讀取。

來源

2015-02-11 06:36:44 Tios

這將未解壓tar -zxvf test.tgz

來源

2015-02-08 16:04:27 Eduardo

雖然此代碼示例可能可以回答問題，但最好在答案中包含一些基本解釋。現在看來，這個答案對未來的讀者幾乎沒有任何價值。 – 2015-02-08 21:50:00

其實我希望將「test.tgz」上傳到流媒體任務開始的hdfs後，它會自動解壓。如[文檔]（http://hadoop.apache.org/docs/stable/hadoop-mapreduce-lient/hadoop-mapreduce-client-core/HadoopStreaming.html#Making_Archives_Available_to_Tasks）所述。「-archives選項允許您將jars本地複製到當前任務的工作目錄，並自動unjar文件」 – Tios 2015-02-09 00:51:06

當編寫hadoop streaming任務時，hadoop streaming不解壓檔案文件

回答

相關問題