2013-04-13 28 views
0

我想分析一個Apache日誌,目標是找出所有用戶代理及其使用率。當結果包含每個useragent,計數和百分比時,以下程序可以正常工作。當試圖按照最常用的順序進行排序時,程序在最後一行失敗。有人可以幫忙嗎?豬訂單命令失敗

logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent); 

uarows = FOREACH logs GENERATE userAgent; 
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count; 
dump total; 

gpuarows = GROUP uarows BY userAgent; 
result = FOREACH gpuarows { 
     subtotal = COUNT(uarows); 
     GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage; 
     }; 
orderresult = ORDER result BY SUB_TOTAL DESC; 
dump orderresult; 

有什麼奇怪的是,「轉儲結果」工作得很好,所以它的ORDER線製造麻煩

錯誤:

013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720 
2013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680 
2013-04-13 11:33:09,995 [Thread-48] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005 
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) 
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) 
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37) 
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) 
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177) 
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124) 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131) 
    ... 6 more 
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005 
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult 
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C: R: 
2013-04-13 11:33:15,286 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 
2013-04-13 11:33:15,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs 
2013-04-13 11:33:15,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 
2013-04-13 11:33:15,288 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 
1.0.4 0.11.0 dliu 2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY 

Some jobs have failed! Stop running all dependent jobs 

Job Stats (time in seconds): 
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs 
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows MULTI_QUERY,COMBINER  
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER 
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a orderresult SAMPLER 

Failed Jobs: 
JobId Alias Feature Message Outputs 
job_local_0005 orderresult ORDER_BY Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388, 

Input(s): 
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log" 

Output(s): 
Failed to produce result in "file:/tmp/temp265162785/tmp896004388" 

Counters: 
Total records written : 0 
Total bytes written : 0 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0 

Job DAG: 
job_local_0002 -> job_local_0003, 
job_local_0003 -> job_local_0004, 
job_local_0004 -> job_local_0005, 
job_local_0005 


2013-04-13 11:33:15,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs 
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult 
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log 
+0

你是怎麼開始的豬? – Frederic

+0

對於在尋找[錯誤1066:無法打開別名的迭代器]時發現此帖子的人(http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for-alias-在豬通用解決方案)這裏是[通用解決方案](http://stackoverflow.com/a/34495086/983722)。 –

回答

1

請檢查你不已經提交/ tmp/temp265162785/tmp896004388 對於不同的任務,您可以使用相同的文件\目錄。

2

確保兩兩件事:

1)運行豬在本地模式:豬-x本地 2)設置或者PIG_HOME或PIG_INSTALL環境變量指向豬的安裝目錄

+0

我在ubuntu上遇到了同樣的問題......設置pig -x本地修復了它。 – hba