我目前正試圖在外部庫spark-avro的HDInsight集羣上運行Spark Scala作業,但沒有成功。有人可以幫我解決這個問題嗎? 目標旨在查找必需的步驟,以便能夠讀取駐留在HDInsight羣集上的Azure blob存儲上的avro文件。如何添加databricks avro jar到hdinsight
電流規格:
- 火花2.0在Linux(HDI 3.5)clustertype
- Scala的2.11.8
- 火花組裝2.0.0-hadoop2.7.0-SNAPSHOT.jar
- 火花avro_2.11:使用3.2.0
教程:https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-intellij-tool-plugin
星火Scala代碼:
基礎上的例子:https://github.com/databricks/spark-avro
import com.databricks.spark.avro._
import org.apache.spark.sql.SparkSession
object AvroReader {
def main (arg: Array[String]): Unit = {
val spark = SparkSession.builder().master("local").getOrCreate()
val df = spark.read.avro("wasb://[email protected]/directory")
df.head(5)
}
}
錯誤接收:
java.lang.NoClassDefFoundError: com/databricks/spark/avro/package$
at MediahuisHDInsight.AvroReader$.main(AvroReader.scala:14)
at MediahuisHDInsight.AvroReader.main(AvroReader.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.avro.package$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
請提供您的構建文件。它看起來像你的jar期望一定的運行時依賴。 – Vidya