2017-05-30 102 views
0

我在讀取apache spark中的本地文件時出錯。 階> VAL F = sc.textFile( 「/家/ Cloudera的/下載/ sample.txt的」)使用spark讀取文件時出錯

f: org.apache.spark.rdd.RDD[String] = /home/cloudera/Downloads/sample.txt MapPartitionsRDD[9] at textFile at <console>:27 

階> f.count()

org.apache.hadoop .mapred.InvalidInputException:輸入路徑不存在 : HDFS://quickstart.cloudera:8020 /家庭/ Cloudera公司/下載/ sample.txt的在 org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java: 287) at org.apache.hadoop.mapred.File InputFormat.listStatus(FileInputFormat.java:229) 在 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) 在org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202 ) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD .scala:237) at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)at org.apache.spark.rdd.MapPartitionsRDD .getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd .RDD.partitions(RDD.scala:237)at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959)at org.apache.spark.rdd.RDD.count(RDD.scala:1157)at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:30)at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:35)at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:37)at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:39) )在 $ iwC $$ iwC $$ iwC $$ iwC。(:41)at $ iwC $$ iwC $$ iwC。(:43)at $ iwC $$ iwC。(:45) at $ iwC。 (:47)at(:49)at 。(:53)at。()在 。(7)在()在$打印() 在sun.reflect.NativeMethodAccessorImpl.invoke0(本機方法)在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在 的太陽。。反射.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.spark.repl.SparkIMain $ ReadEvalPrint.call(SparkIMain.scala: 1045) 在 org.apache.spark.repl.SparkIMain $ Request.loadAndRun(SparkIMain.scala:1326) 在 org.apache.spark.repl.SparkIMain.loadAndRunReq $ 1(SparkIMain.scala:821) 在組織.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:8 52) 在org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800) 在 org.apache.spark.repl.SparkILoop.reallyInterpret $ 1(SparkILoop.scala:857) 在 org.apache .spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)at org.apache.spark.repl.SparkILoop.processLine $ 1( SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop $ 1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop。org $ apache $ spark $ repl $ SparkILoop $$ loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop $$ anonfun $ org $ apache $ spark $ repl $ SparkILoop $$ process $ 1.apply $ mcZ $ sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop $$ anonfun $ org $ apache $ spark $ repl $ SparkILoop $$ process $ 1.apply(SparkILoop.scala:945) 在 org.apache.spark.repl.SparkILoop $$ anonfun $ org $ apache $ spark $ repl $ SparkILoop $$進程$ 1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader $ .savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org $ apache $ spark $ repl $ Sparkiloop $$進程(SparkILoop.scala:945) at org.apache.spark.repl .SparkILo op.process(SparkILoop.scala:1064) at org.apache.spark.repl.Main $ .main(Main.scala:35)at org.apache.spark.repl.Main.main(Main.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(本機方法)在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang中在 組織 :.reflect.Method.invoke(Method.java:606)在 org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(730 SparkSubmit.scala)。 apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit $ .su bmit(SparkSubmit.scala:206) 在org.apache.spark.deploy.SparkSubmit $。主要(SparkSubmit.scala:121) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

回答

1

您必須指定文件路徑。當您設置了hadoop路徑時,需要指定路徑。

sc.textFile("file:///home/cloudera/Downloads/sample.txt") 

希望這會有所幫助!

+0

scala> f.count() [Stage 0:>(0 + 0)/ 2] 17/05/30 02:38:26 WARN cluster.YarnScheduler:初始作業未接受任何資源;檢查您的集羣UI以確保工作人員已註冊且擁有足夠的資源 17/05/30 02:39:16 ERROR scheduler.LiveListenerBus: org.apache.spark.SparkException:作業0取消,因爲SparkContext已關閉 –

+0

你試圖在本地或紗線上運行? –

+0

這裏是很好的文章https://www.datastax.com/dev/blog/common-spark-troubleshooting –