火花數據表格集

-4

我做了一個獨立的Apache集羣7個。要運行Scala代碼，代碼是火花數據表格集

/** Our main function where the action happens */ 

def main(args: Array[String]) { 

    // Set the log level to only print errors 

    Logger.getLogger("org").setLevel(Level.ERROR) 

    // Create a SparkContext without much actual configuration 

    // We want EMR's config defaults to be used. 

    val conf = new SparkConf() 

    conf.setAppName("MovieSimilarities1M") 

    val sc = new SparkContext(conf) 

    val input = sc.textFile("file:///home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv") 

    val mappedInput = input.map(extractCustomerPricePairs) 

    val totalByCustomer = mappedInput.reduceByKey((x,y) => x + y) 

    val flipped = totalByCustomer.map(x => (x._2, x._1)) 

    val totalByCustomerSorted = flipped.sortByKey() 

    val results = totalByCustomerSorted.collect() 

    // Print the results. 

    results.foreach(println) 

    } 

}

步驟是：

我創建使用.jar文件SBT
使用提交作業火花提交* .jar

但是我的執行程序找不到sc.textFile("file:///home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv")

此customer-orders.csv文件存儲在我的主PC中。

完整堆棧跟蹤：

error: [Stage 0:> (0 + 2)/2]17/09/25 17:32:35 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 5, 141.225.166.191, executor 2): java.io.FileNotFoundException: File file:/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv does not exist

我怎麼解決這個問題呢？

請修改代碼以在羣集中運行。

來源

2017-09-25 Rakib Al-Fahad

錯誤：[階段0：>（0 + 2）/ 2] 17/09/25 17:32:35錯誤TaskSetManager：階段0.0中的任務0失敗4次;中止作業線程「main」中的異常org.apache.spark.SparkException：由於階段失敗而導致作業中止：階段0中的任務0。0失敗4次，最近失敗：在階段0.0（TID 5,141.225.166.191，執行器2）中丟失任務0.3：java.io.FileNotFoundException：文件文件：/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv不存在 –

爲了讓您的工作節點能夠訪問該文件，您有幾個選項。

1.手動將文件複製到所有節點。

每個節點應正好這條道路有此文件：/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv

2.附加文件提交作業。

有一個選項調用--files，使您可以複製任意數量的文件，同時提交這樣的作業：

spark-submit --master ... -jars ... --files /home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv

不要濫用這一點。此選項更適用於測試目的和小文件。

3.使用一些可供所有節點訪問的外部通用可用存儲。

S3和NFS共享是流行的選擇。

sc.textFile("s3n://bucketname/customer-orders.csv")

4.您可以在你的驅動程序讀取數據，然後將其轉換爲加工做RDD。

val bufferedSource = io.Source.fromFile("/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv") 
val lines = (for (line <- bufferedSource.getLines()) yield line).toList 
val rdd = sc.makeRdd(lines)

一般不推薦使用，但可用於快速檢測。

來源

2017-09-26 07:25:36

感謝您的幫助。這個概念現在很清楚 –

火花數據表格集

回答

相關問題