星火1.4.1 - 使用pyspark

我使用這個命令試過，我得到錯誤星火1.4.1 - 使用pyspark

代碼

instances = sqlContext.sql("SELECT instance_id ,instance_usage_code 
FROM ib_instances WHERE (instance_usage_code) = 'OUT_OF_ENTERPRISE' ") 

instances.write.format("orc").save("instances2") 

hivectx.sql(""" CREATE TABLE IF NOT EXISTS instances2 (instance_id 
string, instance_usage_code STRING)""") 

hivectx.sql (" LOAD DATA LOCAL INPATH '/home/hduser/instances2' into 
table instances2 ")

錯誤

Traceback (most recent call last): File "/home/hduser/spark_script.py", line 57, in instances.write.format("orc").save("instances2") File "/usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/s ql/readwriter.py", line 304, in save File "/usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/ py4j/java_gateway.py", line 538, in call File "/usr/local/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/ py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o55.save. : java.lang.AssertionError: assertion failed: The ORC data source can only be used with HiveContext. at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.sql.hive.orc.DefaultSource.createRelation(OrcRelation .scala:54) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:322) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:144) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccess orImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745)

來源

2015-09-04 Subhajit Purkayastha

我的猜測是，你可以創建標準SQLContext，而不是Hive（增加了一些選項）。創建您的sqlContext作爲HiveContext實例。斯卡拉版本是：

val sqlContext = new HiveContext(sc)

來源

2015-09-08 12:08:09 Niemand

星火1.4.1 - 使用pyspark

回答

相關問題