使用Spark JdbcRDD讀取PostgreSQL表時出錯

我想從PostgreSQL 9.6中讀取表格到Spark 2.1.1的RDD中，爲此我在Scala中有以下代碼。使用Spark JdbcRDD讀取PostgreSQL表時出錯

import org.apache.spark.rdd.JdbcRDD 
import java.sql.DriverManager 
import org.apache.spark.SparkContext 

val sc = SparkContext.getOrCreate() 

val rdd = new org.apache.spark.rdd.JdbcRDD(
    sc, 
    () => {DriverManager.getConnection(
    "jdbc:postgresql://my_host:5432/my_db", "my_user", "my_pass")}, 
    sql = "select * from my_table", 
    0, 100000, 2)

但是，它返回以下錯誤：

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7, 10.0.0.13, executor 1): org.postgresql.util.PSQLException: The column index is out of range: 1, number of columns: 0.

我使用的是最新的PostgreSQL JDBC驅動和我檢查它是否正確驗證agaisnt數據庫。

任何想法，爲什麼這可能會發生或任何替代方案，我可以嘗試？

來源

2017-06-16 ami232

從spark documentation

The query must contain two ? placeholders for parameters used to partition the results

和

lowerBound the minimum value of the first placeholder param; upperBound the maximum value of the second placeholder

所以您的查詢應該看起來更像

select * from my_table where ? <= id and id <= ?

來源

2017-06-16 13:52:11

「從MY_TABLE選擇*偏移？限制？」是我需要的，歡呼！ – ami232

使用Spark JdbcRDD讀取PostgreSQL表時出錯

回答

相關問題