我是hadoop的新手,剛開始嘗試使用scala和spark連接到hdfs,但不知道配置有什麼問題。請幫我解決並理解它。scala的hdfs連接錯誤
Hadoop Version is 2.7.3
Scala Version is 2.12.1
Spark Version is 2.1.1
的pom.xml(依賴關係)
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
階代碼庫
object SparkHDFS {
def getDataFromHdfs {
val hdfs = FileSystem.get(new URI("hdfs://localhost:9000"), new Configuration)
val file = new Path("rdd/insurance.csv")
val stream = hdfs open file
println(stream.readLine())
}
def main(arr: Array[String]) {
getDataFromHdfs
}
}
異常上控制檯:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.DistributedFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2400)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2411)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at com.sample.sparkscala.SparkHDFS$.getDataFromHdfs(SparkHDFS.scala:11)
at com.sample.sparkscala.SparkHDFS$.main(SparkHDFS.scala:18)
at com.sample.sparkscala.SparkHDFS.main(SparkHDFS.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration$DeprecationDelta
at org.apache.hadoop.hdfs.HdfsConfiguration.addDeprecatedKeys(HdfsConfiguration.java:66)
at org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>(HdfsConfiguration.java:31)
at org.apache.hadoop.hdfs.DistributedFileSystem.<clinit>(DistributedFileSystem.java:116)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration$DeprecationDelta
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more
你是否從eclipse觸發了你的scala代碼? 試着製作一個罐子;並以「hadoop jar ...」命令提交。 這樣,它將從classpath中獲取所有與hadoop相關的lib jar。 –