2017-06-20 160 views
-2

我想使用spark-submit與Spark 2.1.0來讀取Avro文件。

Maven依賴:

​​

收到以下異常:

Caused by: java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class 
     at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1$$anon$1.<init>(DefaultSource.scala:205) 
     at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1.apply(DefaultSource.scala:205) 
     at com.databricks.spark.avro.DefaultSource$$anonfun$buildReader$1.apply(DefaultSource.scala:160) 
     at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:138) 
     at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:122) 
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:150) 
     at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102) 
     at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) 
     at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) 
     at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225) 
     at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) 
     at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826) 
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 
     at org.apache.spark.scheduler.Task.run(Task.scala:99) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class 
     at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
     at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 

我檢查了類似的帖子,並嘗試了各種方案,但未能解決異常。

回答

0

問題的根源在於Scala 2.10(spark-avro_2.10)和Scala 2.11庫的混合。如果你使用Scala 2.11,它應該是:

<dependency> 
    <groupId>com.databricks</groupId> 
    <artifactId>spark-avro_2.11</artifactId> 
    <version>3.2.0</version> 
</dependency>  
+0

spark的版本是2.1.0。這就是爲什麼我使用spark-avro_2.10。 –

+0

Dint意識到Spark 2.1.0默認使用Scala 2.11。使用spark-avro_2.11解決了這個問題。謝謝。 –

相關問題