假設你有在未壓縮HDFS火花日誌文件,但你想在spark-defaults.conf
打開spark.eventLog.compress true
和前進並壓縮舊的日誌。地圖減少的方法會最有意義,但作爲一個你也可以使用:
snzip -t hadoop-snappy local_file_will_end_in_dot_snappy
然後上傳把它直接。
安裝snzip可能類似於此:
sudo yum install snappy snappy-devel
curl -O https://dl.bintray.com/kubo/generic/snzip-1.0.4.tar.gz
tar -zxvf snzip-1.0.4.tar.gz
cd snzip-1.0.4
./configure
make
sudo make install
貴輪單個文件之旅可能是:
hdfs dfs -copyToLocal /var/log/spark/apps/application_1512353561403_50748_1 .
snzip -t hadoop-snappy application_1512353561403_50748_1
hdfs dfs -copyFromLocal application_1512353561403_50748_1.snappy /var/log/spark/apps/application_1512353561403_50748_1.snappy
或者與gohdfs:
hdfs cat /var/log/spark/apps/application_1512353561403_50748_1 \
| snzip -t hadoop-snappy > zzz
hdfs put zzz /var/log/spark/apps/application_1512353561403_50748_1.snappy
rm zzz
這不是可能,'put'只是移動數據。 –