我在AWS Elastic Map Reduce 5.3.1中使用spark-shell
與Spark 2.1.0從Postgres數據庫加載數據。 loader.load
總是失敗,然後成功。爲什麼會發生?關於EMR Spark,JDBC加載首次失敗,然後運行
[[email protected][SNIP] ~]$ SPARK_PRINT_LAUNCH_COMMAND=1 spark-shell --driver-class-path ~/postgresql-42.0.0.jar
Spark Command: /etc/alternatives/jre/bin/java -cp /home/hadoop/postgresql-42.0.0.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ -Dscala.usejavacp=true -Xmx640M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError=kill -9 %p org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=/home/hadoop/postgresql-42.0.0.jar --class org.apache.spark.repl.Main --name Spark shell spark-shell
========================================
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/02/28 17:17:52 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
17/02/28 17:18:56 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://[SNIP]
Spark context available as 'sc' (master = yarn, app id = application_1487878172787_0014).
Spark session available as 'spark'.
Welcome to
____ __
/__/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_121)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val loader = spark.read.format("jdbc") // connection options removed
loader: org.apache.spark.sql.DataFrameReader = [email protected]
scala> loader.load
java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:84)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun$7.apply(JDBCOptions.scala:84)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:83)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:34)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
... 48 elided
scala> loader.load
res1: org.apache.spark.sql.DataFrame = [id: int, fsid: string ... 4 more fields]
你有沒有遇到一個解決方案嗎?在當前的EMR版本中看到相同的行爲。還要ping @Raje。 – kadrach
解決了我的問題:) – kadrach