2017-10-13 103 views
0

我正在工作spark版本2.2.0,& Python 2.7。我正在使用pyspark &嘗試檢索數據連接BigSQL。下面是我用Pyspark錯誤+方法__getnewargs __([])不存在

import cPickle as cpick 
import numpy as np 
import pandas as pd 
import time 
import sys 
from pyspark.sql.session import SparkSession 
spark = SparkSession.builder.getOrCreate() 
spark_train_df = spark.read.jdbc("jdbc:db2://BigSQL URL:Port:sslConnection=true;","Schema.Table", 
      properties={"user": "my userid", 
         "password": "password", 
         'driver' : 'com.ibm.db2.jcc.DB2Driver'}) 
spark_train_df.registerTempTable('data_table') 
# query to get columns necessary to create indexes 
sql = "select * FROM data_table" 
train_df = spark.sql(sql) 

cmr_dict = { 'date': time.strftime('%a, %b %d, %Y'), 
      'description': '`cmrs` contains data from data_table', 
      'cmrs': train_df} 

with open('cmrs.pkl', mode='wb') as fp: 
    cpick.dump(cmr_dict, fp, cpick.HIGHEST_PROTOCOL) 

運行我收到錯誤消息

Py4JError: An error occurred while calling o79.__getnewargs__. Trace: 
py4j.Py4JException: Method __getnewargs__([]) does not exist 
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) 
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) 
    at py4j.Gateway.invoke(Gateway.java:272) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:214) 
    at java.lang.Thread.run(Thread.java:748) 

回答

2

到泡菜星火分佈式對象的代碼是不可能的後的代碼。這些只是JVM結構的代理,更不用說它們不包含任何數據(只是計算的描述)。

如果你要pickle數據,collect並且序列化結果

相關問題