如何在Python中使用熊貓火花筆記本（dashDB上的數據）

您好，我正在使用IBM Bluemix。在這裏，我使用的是Apache Spark筆記本，並從dashDB加載數據。我試圖提供一個可視化，它不顯示行，只是列。如何在Python中使用熊貓火花筆記本（dashDB上的數據）

def get_file_content(credentials): 

from pyspark.sql import SQLContext 
sqlContext = SQLContext(sc) 


props = {} 
props['user'] = credentials['username'] 
props['password'] = credentials['password'] 

# fill in table name 
table = credentials['username'] + "." + "BATTLES" 

    data_df=sqlContext.read.jdbc(credentials['jdbcurl'],table,properties=props) 
data_df.printSchema() 

return StringIO.StringIO(data_df)

當我使用這個命令：

data_df.take(5)

我得到的第一個5列列和行數據信息。但是，當我這樣做：

content_string = get_file_content(credentials) 
BATTLES_df = pd.read_table(content_string)

我得到這個錯誤：

ValueError: No columns to parse from file

，然後當我嘗試看看.head()或.tail()僅顯示列名。

有沒有人在這裏看到可能的問題？我對python的知識很差。謝謝，麻煩您了。

來源

2016-06-07 Saraida

export PYSPARK_DRIVER_PYTHON=ipython 
export PYSPARK_DRIVER_PYTHON_OPTS=notebook

，並轉到您的火花目錄

cd ~/spark-1.6.1-bin-hadoop2.6/ 

./bin/pyspark --packages com.datastax.spark:spark-cassandra-connector_scalaversion:spark_version-M1

而且你可以下面的代碼編寫。

import pandas as pd

來源

2016-06-07 22:24:02

這是適合我的解決方案。我換成 BATTLES_df = pd.read_table(content_string)

與

BATTLES_df=data_df.toPandas()

謝謝

來源

2016-06-08 00:20:37 Saraida

如何在Python中使用熊貓火花筆記本（dashDB上的數據）

回答

相關問題