2015-06-19 90 views
3

我嘗試撥打 時出現以下錯誤我使用python客戶端進行了火花。Spark java.lang.VerifyError

lines = sc.textFile(hdfs://...) 
lines.take(10) 

我懷疑spark和hadoop版本可能不兼容。以下是Hadoop的版本的結果: 的Hadoop 2.5.2 顛覆詹金斯在2014-11-14T23編譯https://git-wip-us.apache.org/repos/asf/hadoop.git -r cc72e9b000545b86b75a61f4835eb86d57bfafc0 :45Z 編譯時protoc 2.5.0 從源與校驗df7537a4faa4658983d397abf4514320 該命令使用運行/etc/hadoop-2.5.2/share/hadoop/common/hadoop-common-2.5.2.jar

我也有spark 1.3.1。

File "/etc/spark/python/pyspark/rdd.py", line 1194, in take 
    totalParts = self._jrdd.partitions().size() 
File "/etc/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ 
File "/etc/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
    py4j.protocol.Py4JJavaError: An error occurred while calling o21.partitions. 
    : java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method getUnknownFields. ()Lcom/google/protobuf/UnknownFieldSet; 
    at java.lang.ClassLoader.defineClass1(Native Method) 
    at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) 
    at java.lang.ClassLoader.defineClass(ClassLoader.java:615) 
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) 
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) 
    at java.net.URLClassLoader.access$000(URLClassLoader.java:58) 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:197) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247) 
    at java.lang.Class.getDeclaredMethods0(Native Method) 
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2436) 
    at java.lang.Class.privateGetPublicMethods(Class.java:2556) 
    at java.lang.Class.privateGetPublicMethods(Class.java:2566) 
    at java.lang.Class.getMethods(Class.java:1412) 
    at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:409) 
    at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:306) 
    at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:610) 
    at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:690) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92) 
    at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:366) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:262) 
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153) 
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:602) 
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:547) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) 
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) 
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625) 
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) 
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) 
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256) 
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) 
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) 
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
    at scala.Option.getOrElse(Option.scala:120) 
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
    at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64) 
    at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
    at py4j.Gateway.invoke(Gateway.java:259) 

我一直在尋找這個問題,有人稱它爲protobuffer的版本,但我不是很熟悉如何正確設置它。任何想法?

回答

0

檢查您編譯的pom.xml文件

搜索protobuf版本。它可能解決問題。

或者這個問題可能是這個Jira線程中提到的其他問題。

https://issues.apache.org/jira/browse/SPARK-7238

+0

謝謝,應該建立火花的命令是什麼?我需要指定hadoop版本以及如何? –

+0

如果您已經下載了該軟件包,該命令將構建spark及其示例程序:build/mvn -DskipTests clean package – sahitya

0

您需要檢查是需要這個的Hadoop版本的py4j罐子版本。下載並將其放置在spark安裝目錄的lib文件夾中。並檢查bashrc的路徑參考。它將修復此錯誤