2017-10-06 104 views
0

我有一個數據集的面積和價格從42個公寓。我正在使用Python與數據庫,我加載了一個csv文件作爲列分隔符,。之後,我將區域指定爲整數和價格雙倍。於是我進口的圖形庫和做迴歸:適合功能無法執行減少與靈活類型

import matplotlib.pyplot as plt 
from sklearn import linear_model 

後來我讀我的數據庫:

aptos=sqlContext.read.format('csv').options(header='true', 
interSchema='true').load('/FileStore/tables/yl3r1mgv1507304115516/aptos_dataset-5ad32.csv') 
display(aptos) 

下列行,我創建的輸入變量從數據庫列:

X=aptos.select("area").collect() 
Y=aptos.select("precio").collect() 

然後我創建我的迴歸模型:

regr = linear_model.LinearRegression() 

在這一點上我沒有問題。但是,當我運行下面一行:

regr.fit(X,Y) 

我得到錯誤:

TypeError: cannot perform reduce with flexible type 

我可以看到更多的細節:

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<command-2158797891361999> in <module>() 
     1 
     2 
----> 3 regr.fit(X,Y) 

/databricks/python/local/lib/python2.7/site-packages/sklearn/linear_model/base.pyc in fit(self, X, y, sample_weight) 
    517   X, y, X_offset, y_offset, X_scale = self._preprocess_data(
    518    X, y, fit_intercept=self.fit_intercept, normalize=self.normalize, 
--> 519    copy=self.copy_X, sample_weight=sample_weight) 
    520 
    521   if sample_weight is not None: 

/databricks/python/local/lib/python2.7/site-packages/sklearn/linear_model/base.pyc in _preprocess_data(X, y, fit_intercept, normalize, copy, sample_weight, return_mean) 
    197    else: 
    198     X_scale = np.ones(X.shape[1]) 
--> 199   y_offset = np.average(y, axis=0, weights=sample_weight) 
    200   y = y - y_offset 
    201  else: 

/databricks/python/local/lib/python2.7/site-packages/numpy/lib/function_base.pyc in average(a, axis, weights, returned) 
    933 
    934  if weights is None: 
--> 935   avg = a.mean(axis) 
    936   scl = avg.dtype.type(a.size/avg.size) 
    937  else: 

/databricks/python/local/lib/python2.7/site-packages/numpy/core/_methods.pyc in _mean(a, axis, dtype, out, keepdims) 
    63   dtype = mu.dtype('f8') 
    64 
---> 65  ret = umr_sum(arr, axis, dtype, out, keepdims) 
    66  if isinstance(ret, mu.ndarray): 
    67   ret = um.true_divide(

TypeError: cannot perform reduce with flexible type 

我很抱歉,但我不能分享我的數據庫。我是Python的新手,我對R有更多的專業知識。我會很感激你的幫助。

+0

什麼是導入數據的架構?你可能有'X'和'Y'的字符串。另外,它是'inferSchema ='true''而不是'interSchema ='true''。 – Abdou

回答

0

感謝阿卜杜。有一個打字錯誤讀我的數據庫,這是正確的做法:

aptos=sqlContext.read.format('csv').options(header='true', inferSchema='true').load('/FileStore/tables/yl3r1mgv1507304115516/aptos_dataset-5ad32.csv') 

現在迴歸工作:

regr.fit(X,Y) 
Out[4]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)