Spark不能用mongo-hadoop連接器的BSONFileInputFormat編譯newAPIHadoopRDD

我在spark中使用mongo-hadoop客戶端（r1.5.2）從mongoDB和bson中讀取數據，請看以下鏈接：https://github.com/mongodb/mongo-hadoop/wiki/Spark-Usage。到目前爲止，我可以從mongoDB讀取沒有問題。但是，bson配置甚至無法編譯。請幫忙。Spark不能用mongo-hadoop連接器的BSONFileInputFormat編譯newAPIHadoopRDD

我的代碼在斯卡拉：

dataConfig.set("mapred.input.dir", "path.bson") 

    val documents = sc.newAPIHadoopRDD(
     dataConfig,     
     classOf[BSONFileInputFormat], 
     classOf[Object],    
     classOf[BSONObject])

錯誤：

Error:(56, 24) inferred type arguments [Object,org.bson.BSONObject,com.mongodb.hadoop.mapred.BSONFileInputFormat] do not conform to method newAPIHadoopRDD's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]] 
    val documents = sc.newAPIHadoopRDD(
        ^

來源

2016-06-21 Hunter Lin

嘗試使用BSONFileInputFormat而不是MongoInputFormat。還請指定您正在使用的mongo-hadoop連接器的版本。 –

我找到了解決的辦法！這個問題似乎通過InputFormat

泛型newAPIHadoopRDD被要求輸入的格式

F <: org.apache.hadoop.mapreduce.InputFormat[K,V]

雖然BSONFileInputFormat延伸FileInputFormat引起[K，V]，其延伸InputFormat [K，V]，它沒有將K，V泛型指定爲Object和BSONObject。（實際上，在BSONFileInputFormat中沒有提到K，V泛型，這個類是否可以真正編譯？）。

總之，解決的辦法是投BSONFileInputFormat作爲InputFormat與K和V子類定義的：

val documents = sc.newAPIHadoopRDD(
    dataConfig,     
    classOf[BSONFileInputFormat].asSubclass(classOf[org.apache.hadoop.mapreduce.lib.input.FileInputFormat[Object, BSONObject]]), 
    classOf[Object],    
    classOf[BSONObject])

現在工作沒有任何問題:)

來源

2016-08-03 02:43:26

Spark不能用mongo-hadoop連接器的BSONFileInputFormat編譯newAPIHadoopRDD

回答

相關問題