2015-07-10 47 views
1

我是新來的豬 我輸入的數據是豬UDF在Java中:錯誤---錯誤1066:無法打開迭代器的別名

(消息,NIL,2015-07-01,22: 58:53.66,E,machine.com.name,12,0xd6,字符串,字符串 ,0,0.0,鍵=值&鍵= 123456789 &鍵=值&鍵= US &鍵=公司&鍵=消息&關鍵= 123456789 & key = String & key = String & Key = String & Key = String)

我寫的Java UDF如下

package com.pig.udf; 

import java.io.IOException; 
import java.util.ArrayList; 
import java.util.Arrays; 
import java.util.HashMap; 
import java.util.Map; 

import org.apache.pig.EvalFunc; 
import org.apache.pig.data.Tuple; 

public class PigUDF extends EvalFunc<Map> { 


    @Override 
    public Map<String, String> exec(Tuple input) throws IOException { 
     // If tuple is null, has fewer than 3 values, or has an even number of 
     // values 
     if (input == null || input.size() < 3 || (input.size() % 2 == 0)) { 
      throw new IOException("Incorrect number of values."); 
     } 

     String source = (String) input.get(0); 
     System.out.println("input Source"+source); 
     String delim = (input.size() > 1) ? (String) input.get(1) : "&"; 
     int length = (input.size() > 2) ? (Integer) input.get(2) : 0; 
     if (source == null || delim == null) { 
      return null; 
     } 

     String[] splits = source.split(delim, length); 
     System.out.println("Splits"+ splits); 
     ArrayList<String> arrayList = new ArrayList<String>(
       Arrays.asList(splits)); 
     Map<String, String> map = new HashMap<String, String>(); 
     for (String keyValue : arrayList) { 
      int end = keyValue.indexOf('='); 
      if (end != -1) { 
       map.put(keyValue.substring(0, end), keyValue.substring(end + 1)); 
      } 

     } 
     System.out.println("map"+map); 

     return map; 

    } 

} 

當我與上面的Java UDF我收到以下錯誤運行我的豬腳本解析輸入數據的最後一個字符串

Pig Stack Trace 
--------------- 
ERROR 1066: Unable to open iterator for alias C 

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias C 
    at org.apache.pig.PigServer.openIterator(PigServer.java:892) 
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774) 
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) 
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) 
    at org.apache.pig.Main.run(Main.java:607) 
    at org.apache.pig.Main.main(Main.java:156) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 
Caused by: java.io.IOException: Job terminated with anomalous status FAILED 
    at org.apache.pig.PigServer.openIterator(PigServer.java:884) 
    ... 13 more 



    Application Log 
    ------------------------------------------------------------------- 
    Application application_1436453941326_0020 failed 2 times due to AM Container for appattempt_1436453941326_0020_000002 exited with exitCode: 1 
For more detailed output, check application tracking page:http://quickstart.cloudera:8088/proxy/application_1436453941326_0020/Then, click on links to logs of each attempt. 
Diagnostics: Exception from container-launch. 
Container id: container_1436453941326_0020_02_000001 
Exit code: 1 
Stack trace: ExitCodeException exitCode=1: 
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) 
at org.apache.hadoop.util.Shell.run(Shell.java:455) 
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) 
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) 
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) 
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745) 
Container exited with a non-zero exit code 1 
Failing this attempt. Failing the application. 

我的腳本運行良好,沒有Java UDF功能,也給我outfile。 當我在我的Pig腳本中包含Java UDF時,就會出現這個問題。 有運行豬 任何指針我的Java UDF和機器之間沒有Java版本不匹配可以理解

豬腳本:

Register '/home/cloudera/Pig/PigUDF_1.7.jar'; 
Register '/home/cloudera/Pig/pig.jar'; 
A= Load 'Logs_message.txt' using PigStorage(',') as (component:chararray,Nil:chararray,date:chararray,time:chararray,E:chararray,machine_address:chararray,number1:chararray,hex_number:chararray,cal_type:chararray,cal_name:chararray,number2:chararray,number3:chararray,data:chararray) 
B = filter A by cal_name matches 'CHANGEDMESSAGE'; 
C = foreach B generate cal_name ,com.pig.udf.PigUDF(data) as dataMap; 
dump C ; 
+0

你是怎麼稱呼udf的?另外,請查找更詳細的日誌。 – Frederic

+0

您可以將豬腳本粘貼到您要調用UDF的地方,我認爲它是您豬腳本中的問題 – Abhi

+0

Hi @Fred,我在哪裏可以找到更詳細的日誌? – Divya

回答

0

我看到3個問題與您的代碼:

  1. 你在第一行中錯過了一個分號。不知道它是如何運行的,假設這是將它複製到StackOverflow的錯誤
  2. 您將變量「E」命名爲:這是一個保留變量。不知道這會有什麼影響,但我不會這樣做是安全的。請參閱here獲取Pig關鍵字列表
  3. (這可能是導致錯誤的原因)。您的驗證沒有意義。它看起來像你創建了一個分割函數,用來設置3個或更少的參數(要分割的字符串,分隔符和最大分割大小)。但是,您正在驗證輸入的參數超過3個。你也正在驗證它有偶數個參數。這看起來像是一個驗證,旨在爲之後的字符串,而不是之前。

應該是這樣的:

if (input == null || input.size() == 0 || input.size() > 3) { 
    throw new IOException("Incorrect number of values."); 
} 
//... 
if(splits.length % 2 != 0) 
    throw new IOException("Invalid key value pairs"); 

我建議,直到你已經調試它們不運行在Hadoop上雲中的程序,讓他們先在本地工作。如果使用PigServer類,則可以通過eclipse或其他IDE在開發計算機上調試UDF。