2017-04-25 70 views
1

我想從java程序中使用Spark SQL,其中pom.xml中的依賴關係指向Spark版本1.6.0。下面是該程序SQLContext.sql上的Spark NoSuchMethodError(Cloudera 5.8.0上的Spark 1.6.0)

package spark_test; 

import java.util.List; 
import org.apache.spark.SparkConf; 
import org.apache.spark.api.java.JavaSparkContext; 
import org.apache.spark.sql.DataFrame; 
import org.apache.spark.sql.SQLContext; 
import org.apache.spark.sql.hive.HiveContext; 

public class MyTest { 
private static SparkConf sparkConf; 

public static void main(String[] args) {   
    String warehouseLocation = args[0]; 
    sparkConf = new SparkConf().setAppName("Hive Test").setMaster("local[*]") 
      .set("spark.sql.warehouse.dir", warehouseLocation); 

    JavaSparkContext ctx = new JavaSparkContext(sparkConf); 
    SQLContext sc = new HiveContext(ctx.sc()); 

    System.out.println(" Current Tables: "); 

    DataFrame results = sc.sql("show tables"); 
    results.show(); 
} 
} 

但是,我得到異常在線程 「主要」 java.lang.NoSuchMethodError:org.apache.spark.sql.SQLContext.sql(Ljava /朗/字符串;)Lorg /阿帕奇/火花/ SQL /數據幀;我創建一個平坦的罐子,從命令行運行

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
SLF4J: Class path contains multiple SLF4J bindings. 
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/PortalHandlerTest.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/SparkTest.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [file:/home/cloudera/workspace/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/home/cloudera/workspace/JARs/slf4j-log4j12-1.7.22.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 
17/04/25 08:44:07 INFO SparkContext: Running Spark version 2.1.0 
17/04/25 08:44:07 WARN SparkContext: Support for Java 7 is deprecated as of Spark 2.0.0 
17/04/25 08:44:07 WARN SparkContext: Support for Scala 2.10 is deprecated as of Spark 2.1.0 
17/04/25 08:44:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
17/04/25 08:44:08 INFO SecurityManager: Changing view acls to: cloudera 
17/04/25 08:44:08 INFO SecurityManager: Changing modify acls to: cloudera 
17/04/25 08:44:08 INFO SecurityManager: Changing view acls groups to: 
17/04/25 08:44:08 INFO SecurityManager: Changing modify acls groups to: 
17/04/25 08:44:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set() 
17/04/25 08:44:09 INFO Utils: Successfully started service 'sparkDriver' on port 43850. 
17/04/25 08:44:09 INFO SparkEnv: Registering MapOutputTracker 
17/04/25 08:44:09 INFO SparkEnv: Registering BlockManagerMaster 
17/04/25 08:44:09 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 
17/04/25 08:44:09 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 
17/04/25 08:44:09 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-4199c353-4e21-4863-8b78-cfa280ce2de3 
17/04/25 08:44:09 INFO MemoryStore: MemoryStore started with capacity 375.7 MB 
17/04/25 08:44:09 INFO SparkEnv: Registering OutputCommitCoordinator 
17/04/25 08:44:09 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 
17/04/25 08:44:09 INFO Utils: Successfully started service 'SparkUI' on port 4041. 
17/04/25 08:44:09 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4041 
17/04/25 08:44:10 INFO Executor: Starting executor ID driver on host localhost 
17/04/25 08:44:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41716. 
17/04/25 08:44:10 INFO NettyBlockTransferService: Server created on 10.0.2.15:41716 
17/04/25 08:44:10 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 
17/04/25 08:44:10 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41716, None) 
17/04/25 08:44:10 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41716 with 375.7 MB RAM, BlockManagerId(driver, 10.0.2.15, 41716, None) 
17/04/25 08:44:10 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41716, None) 
17/04/25 08:44:10 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 41716, None) 
Current Tables: 
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.sql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame; 
at spark_test.MyTest.main(MyTest.java:31) 
17/04/25 08:44:10 INFO SparkContext: Invoking stop() from shutdown hook 
17/04/25 08:44:10 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4041 
17/04/25 08:44:10 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 
17/04/25 08:44:10 INFO MemoryStore: MemoryStore cleared 
17/04/25 08:44:10 INFO BlockManager: BlockManager stopped 
17/04/25 08:44:10 INFO BlockManagerMaster: BlockManagerMaster stopped 
17/04/25 08:44:10 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 
17/04/25 08:44:10 INFO SparkContext: Successfully stopped SparkContext 
17/04/25 08:44:10 INFO ShutdownHookManager: Shutdown hook called 
17/04/25 08:44:10 INFO ShutdownHookManager: Deleting directory /tmp/spark-93fca3d1-ff79-4d2b-b07f-a340c1a60416 

可能是因爲我的POM具有火花1.6.0版本的jar,但Cloudera的虛擬機正在運行2.1.0。 spark-shell正在運行spark版本1.6.0並且工作正常。我如何在我的java程序中強制版本爲1.6.0?

任何幫助,將不勝感激。

回答

0

DataFrame()在數據集()中被Spark 2取代。您需要導入org.apache.spark.sql.Dataset,並使用它來運行Spark 1.6服務器端的Spark 1.6客戶端。更多信息here。從開發者經驗的角度來看,大部分的API都是類似的。老實說,如果不是服務器版本,至少在客戶端使用Spark 2.0依賴關係會更好。

+0

我在Spark 2.1.0下編譯了相同的程序來使用Datasets,但即使我設置了warehouse.dir並將hive-site.xml複製到/ usr/lib/spark /中,也無法通過該程序讀取任何Hive表。 conf /。請看這篇文章[這裏](http://stackoverflow.com/questions/43619137/hive-tables-not-found-in-spark-sql-spark-sql-analysisexception-in-cloudera-vm) – Joydeep

+0

這是一個不同的問題。我會在你的其他帖子中解決它。 –

+0

如果您的原始問題已通過上述步驟解決,請確認。 –

0

您的日誌顯示您正在運行鍼對1.6.0 Spark Cluster的Spark 2.1庫。我的猜測是你的客戶端和服務器庫不是二進制兼容的。我建議您在應用程序中使用與服務器中相同的版本以確保兼容性。

+0

這就是我的猜測。但是,我不知道如何強制我的java程序運行Spark 1.6.0而不是Spark 2.1.0。另外,我在Spark 2.1.0下編譯了相同的程序,但即使我設置了warehouse.dir並將hive-site.xml複製到/ usr/lib/spark/conf /中,也無法通過該程序讀取任何Hive表。請看這篇文章[這裏](http://stackoverflow.com/questions/43619137/hive-tables-not-found-in-spark-sql-spark-sql-analysisexception-in-cloudera-vm) – Joydeep

相關問題