2016-04-26 77 views
4

我在emr上運行spark工作並使用datastax連接器連接到cassandra羣集。我現在面臨的問題與番石榴罐子請如下 我使用下面卡桑德拉發現細節的DEP檢測到番石榴問題#1635,表明正在使用的番石榴少於16.01版本

cqlsh 5.0.1 | Cassandra 3.0.1 | CQL spec 3.3.1 

EMR上運行4.4火花的工作與下面的Maven的DEP

org.apache.spark 火花-streaming_2.10 1.5.0

<dependency> 
    <groupId>org.apache.spark</groupId> 
    <artifactId>spark-core_2.10</artifactId> 
    <version>1.5.0</version> 
</dependency> 

<dependency> 
    <groupId>org.apache.spark</groupId><dependency> 
    <groupId>com.datastax.spark</groupId> 
    <artifactId>spark-cassandra-connector_2.10</artifactId> 
    <version>1.5.0</version> 
</dependency> 

    <artifactId>spark-streaming-kinesis-asl_2.10</artifactId> 
    <version>1.5.0</version> 
</dependency> 

面臨的問題時,我提交的火花工作如下

ava.lang.ExceptionInInitializerError 
     at com.datastax.spark.connector.cql.DefaultConnectionFactory$.clusterBuilder(CassandraConnectionFactory.scala:35) 
     at com.datastax.spark.connector.cql.DefaultConnectionFactory$.createCluster(CassandraConnectionFactory.scala:87) 
     at com.datastax.spark.connector.cql.CassandraConnector$.com$datastax$spark$connector$cql$CassandraConnector$$createSession(CassandraConnector.scala:153) 
     at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) 
     at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$2.apply(CassandraConnector.scala:148) 
     at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:31) 
     at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:56) 
     at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81) 
     at ampush.event.process.core.CassandraServiceManagerImpl.getAdMetaInfo(CassandraServiceManagerImpl.java:158) 
     at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:308) 
     at ampush.event.config.metric.processor.ScheduledEventAggregator$4.call(ScheduledEventAggregator.java:290) 
     at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222) 
     at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:222) 
     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902) 
     at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:902) 
     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) 
     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850) 
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
     at org.apache.spark.scheduler.Task.run(Task.scala:88) 
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.IllegalStateException: Detected Guava issue #1635 which indicates that a version of Guava less than 16.01 is in use. This introduces codec resolution issues and potentially other incompatibility issues in the driver. Please upgrade to Guava 16.01 or later. 
     at com.datastax.driver.core.SanityChecks.checkGuava(SanityChecks.java:62) 
     at com.datastax.driver.core.SanityChecks.check(SanityChecks.java:36) 
     at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:67) 
     ... 23 more 

請讓我知道如何管理番石榴在這裏?

感謝

+0

您的依賴塊不完整 –

回答

1

在你的POM的<dependencies>塊的東西只要加入這樣的:

<dependency> 
    <groupId>com.google.guava</groupId> 
    <artifactId>guava</artifactId> 
    <version>19.0</version> 
</dependency> 

(或任何版本> 16.0.1您喜歡)

+0

我正在瀏覽鏈接https://groups.google.com/a/lists.datastax.com/forum/#!topic/spark-connector-user/HnTsWJkI5jo,其中說Spark 1.5使用番石榴14,卡桑德拉驅動核心需要番石榴16.火花卡桑德拉connetor上升例外。所以如何在上面添加可以解決我的問題可能是一個新手問題。謝謝 –

+0

也按照鏈接https://github.com/datastax/spark-cassandra-connector我使用1.5 cassandra連接器\t 1.5,1.6(spak)\t 3.0(cassandra)?不知道爲什麼我得到問題 –

+0

不知道你在問什麼。如果你有興趣知道爲什麼Maven解決了舊版本的番石榴,你可以使用'mvn dependency:tree'它向你展示如何解決(或忽略)每個依賴關係是如何傳遞的 –

2

我有同樣的問題,並通過使用maven Shade插件來解決它,以遮蔽Cassandra連接器帶來的番石榴版本。

我需要排除可選的,因爲我遇到了Spark嘗試從非陰影Guava Present類型轉換爲陰影可選類型的問題,所以顯示了Present和Absent類。我不確定這是否會在以後引發任何問題,但現在似乎對我有用。

你可以在你的pom.xml添加這個到<plugins>部分:

<plugin> 
    <groupId>org.apache.maven.plugins</groupId> 
    <artifactId>maven-shade-plugin</artifactId> 
    <version>2.4.3</version> 
    <executions> 
     <execution> 
      <phase>package</phase> 
      <goals> 
       <goal> 
        shade 
       </goal> 
      </goals> 
     </execution> 
    </executions> 

    <configuration> 
     <minimizeJar>true</minimizeJar> 
     <shadedArtifactAttached>true</shadedArtifactAttached> 
     <shadedClassifierName>fat</shadedClassifierName> 

     <relocations> 
      <relocation> 
       <pattern>com.google</pattern> 
       <shadedPattern>shaded.guava</shadedPattern> 
       <includes> 
        <include>com.google.**</include> 
       </includes> 

       <excludes> 
        <exclude>com.google.common.base.Optional</exclude> 
        <exclude>com.google.common.base.Absent</exclude> 
        <exclude>com.google.common.base.Present</exclude> 
       </excludes> 
      </relocation> 
     </relocations> 

     <filters> 
      <filter> 
       <artifact>*:*</artifact> 
       <excludes> 
        <exclude>META-INF/*.SF</exclude> 
        <exclude>META-INF/*.DSA</exclude> 
        <exclude>META-INF/*.RSA</exclude> 
       </excludes> 
      </filter> 
     </filters> 

    </configuration> 
</plugin> 
+1

這不會解決這裏的目的。這裏的原因是我們的部署平臺EMR。 emr用於構建spark的默認類路徑的方式是在classpath中使用少於16個guava的版本,這是因爲它會使用EMR 4.2 \ 4.4 \ 4.6中的舊版本的hadoop庫。我已經通過向emr添加引導過程來修復我的默認spark類路徑和更新的路徑。 –

+0

我確認這確實解決了Spark Standalone v1.5.2羣集和Spark Cassandra連接器v1.5.1中的問題。謝謝。 –

5

另一種解決方案,進入目錄

火花/瓶

。重命名guava-14.0.1.jar然後複製guava-19.0.jar喜歡這幅畫:

enter image description here

+2

作爲一個說明,番石榴20不會爲此工作。不過,番石榴19確實有效。 –

+0

這麼棒! –

0

我能夠通過外部添加的番石榴16.0.1罐子,然後在指定的星火類路徑有以下配置值幫助提交來解決這個問題:

--conf 「spark.driver.extraClassPath = /番石榴16.0.1.jar」 --conf 「spark.executor.extraClassPath = /番石榴16.0.1.jar」

希望這可以幫助有人與s類似的錯誤!

0

感謝阿德里安您的迴應。

我在一個不同於其他人的體系結構上的線程,但番石榴問題仍然是一樣的。我正在使用火星2.2與中間層。在我們的開發環境中,我們使用sbt-native-packager來生成我們的碼頭圖像以傳遞給mesos。

原來,我們需要爲火花提交執行程序提供不同的番石榴,而不是我們需要的驅動程序上運行的代碼。這對我有效。

build.sbt

.... 
libraryDependencies ++= Seq(
    "com.google.guava" % "guava" % "19.0" force(), 
    "org.apache.hadoop" % "hadoop-aws" % "2.7.3" excludeAll (
    ExclusionRule(organization = "org.apache.hadoop", name = "hadoop-common"), //this is for s3a 
    ExclusionRule(organization = "com.google.guava", name= "guava")), 
    "org.apache.spark" %% "spark-core" % "2.1.0" excludeAll (
    ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"), 
    ExclusionRule(organization = "com.google.guava", name= "guava")) , 
    "com.github.scopt" %% "scopt" % "3.7.0" excludeAll (
    ExclusionRule("org.glassfish.jersey.bundles.repackaged", name="jersey-guava"), 
    ExclusionRule(organization = "com.google.guava", name= "guava")) , 
    "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.6", 
... 
dockerCommands ++= Seq(
... 
    Cmd("RUN rm /opt/spark/dist/jars/guava-14.0.1.jar"), 
    Cmd("RUN wget -q http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar -O /opt/spark/dist/jars/guava-23.0.jar") 
... 

當我試圖與番石榴16.0.1或19執行人更換番石榴14,但它仍然是行不通的。星火提交剛剛死亡。我的胖罐子實際上是用於我的驅動程序中的番石榴,我被迫成爲19,但我的火花提交執行者,我不得不更換爲23我試着替換到16和19,但火花剛剛死亡也是。

抱歉轉移,但每次我所有的谷歌搜索後,每一次都出現了。我希望這可以幫助其他SBT/mesos人。