火花SQL從SBT斯卡拉

使用谷歌Dataproc星火產業集羣，我SBT建組裝罐子可以通過SparkContext訪問卡桑德拉。火花SQL從SBT斯卡拉

然而，當我通過sqlContext嘗試訪問我獲得遠程羣集上沒有發現火花SQL類 - 雖然我相信dataproc集羣應該被供應用於火花SQL。

java.lang.NoClassDefFoundError: org/apache/spark/sql/types/UTF8String$ 
     at org.apache.spark.sql.cassandra.CassandraSQLRow$$anonfun$fromJavaDriverRow$1.apply$mcVI$sp(CassandraSQLRow.scala:50) 
     at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala

我SBT文件：

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.5.0" % "provided", 
    "org.apache.spark" %% "spark-sql" % "1.5.0" % "provided", 
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.4.0" 
)

關閉「提供的」關於火花SQL使我在罐子複製的合併地獄。

Thx尋求幫助。

來源

2015-11-04 navicore

看起來你還需要spark-cassandra-connector的1.5.0版本，以確保你的類是兼容的。這裏的commit which upgraded the cassandra connector to 1.5.0，你可以看到它消除了org.apache.spark.sql.types.UTF8String進口，並增加了import org.apache.spark.unsafe.types.UTF8String而是改變CassandraSQLRow.scala相關線路：

 data(i) = GettableData.get(row, i) 
     data(i) match { 
     case date: Date => data.update(i, new Timestamp(date.getTime)) 
-  case str: String => data.update(i, UTF8String(str)) 
+  case bigInt: BigInteger => data.update(i, new JBigDecimal(bigInt)) 
+  case str: String => data.update(i, UTF8String.fromString(str)) 
     case set: Set[_] => data.update(i, set.toSeq) 
     case _ => 
     }

雖然看起來只有「里程碑」工件類型，而不是「釋放」類型在Maven central for the cassandra connector中，您仍然應該能夠獲得最新的里程碑連接器1.5.0-M2以處理您的代碼。

編輯：於compatibility table from the Cassandra connector's GitHub README.md

來源

2015-11-04 03:16:32

THX附加鏈接了很多@dennis，看起來像答案。現在試一下，但是1.5.0-M2給出了'io.netty'的彙編重複數據刪除問題。畢竟，我將不得不重新掌握自己的知識......我會很快發佈結果...... – navicore

火花SQL從SBT斯卡拉

回答

相關問題