2013-05-02 150 views
4

我想從我的sci-kit學習模型中預測y_train_actual的均方根誤差與原始值salariesTypeError:不支持的操作數類型爲 - :'numpy.ndarray'和'numpy.ndarray'

問題:但與mean_squared_error(y_train_actual, salaries),我收到錯誤TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'。作爲第二個參數使用list(salaries)而不是salaries會產生相同的錯誤。

隨着mean_squared_error(y_train_actual, y_valid_actual)我收到錯誤Found array with dim 40663. Expected 244768

我怎麼能轉換爲正確的數組類型sklearn.netrucs.mean_squared_error()

代碼

from sklearn.metrics import mean_squared_error 

y_train_actual = [ np.exp(float(row)) for row in y_train ] 
print mean_squared_error(y_train_actual, salaries) 

錯誤

TypeError         Traceback (most recent call last) 
<ipython-input-144-b6d4557ba9c5> in <module>() 
     3 y_valid_actual = [ np.exp(float(row)) for row in y_valid ] 
     4 
----> 5 print mean_squared_error(y_train_actual, salaries) 
     6 print mean_squared_error(y_train_actual, y_valid_actual) 

C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred) 
    1462  """ 
    1463  y_true, y_pred = check_arrays(y_true, y_pred) 
-> 1464  return np.mean((y_pred - y_true) ** 2) 
    1465 
    1466 

TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray' 

代碼

y_train_actual = [ np.exp(float(row)) for row in y_train ] 
y_valid_actual = [ np.exp(float(row)) for row in y_valid ] 

print mean_squared_error(y_train_actual, y_valid_actual) 

錯誤

ValueError        Traceback (most recent call last) 
<ipython-input-146-7fcd0367c6f1> in <module>() 
     4 
     5 #print mean_squared_error(y_train_actual, salaries) 
----> 6 print mean_squared_error(y_train_actual, y_valid_actual) 

C:\Python27\lib\site-packages\sklearn\metrics\metrics.pyc in mean_squared_error(y_true, y_pred) 
    1461 
    1462  """ 
-> 1463  y_true, y_pred = check_arrays(y_true, y_pred) 
    1464  return np.mean((y_pred - y_true) ** 2) 
    1465 

C:\Python27\lib\site-packages\sklearn\utils\validation.pyc in check_arrays(*arrays, **options) 
    191   if size != n_samples: 
    192    raise ValueError("Found array with dim %d. Expected %d" 
--> 193        % (size, n_samples)) 
    194 
    195   if not allow_lists or hasattr(array, "shape"): 

ValueError: Found array with dim 40663. Expected 244768 

代碼

print type(y_train) 
print type(y_train_actual) 
print type(salaries) 

結果

<type 'list'> 
<type 'list'> 
<type 'tuple'> 

打印y_train [:10]

[10.126631103850338, 10.308952660644293, 10.308952660644293, 10.221941283654663, 10.126631103850338, 10.126631103850338, 11.225243392518447, 9.9987977323404529, 10.043249494911286, 11.350406535472453]

打印薪金[:10]

('25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000')

打印列表(工資)[:10]

['25000', '30000', '30000', '27500', '25000', '25000', '75000', '22000', '23000', '85000']

打印文件N(y_train)

244768 

打印LEN(工資)

244768 
+0

你可以添加y_train的「形狀」嗎?我的猜測是,y_train_actual是'ndarrays'的'list',它可能在'mean_square_error()'內發生衝突。 – fgb 2013-05-02 04:39:18

+0

@fgb我得到錯誤'AttributeError:'列表'對象沒有屬性'shape'' – Nyxynyx 2013-05-02 04:41:16

+0

沒錯。你有關於y_train的尺寸的想法嗎? – fgb 2013-05-02 04:42:38

回答

9

TypeError問題從薪金是字符串的列表而y_train_actual是浮筒的列表莖。那些不能被減去。

對於你的第二個錯誤,你應該確保兩個數組的大小相同,否則它不能減去它們。

+0

我試過你的建議,並得到錯誤'float()參數必須是一個字符串或數字' – Nyxynyx 2013-05-02 04:51:00

+1

你使用'np.float()',它的行爲'numpy.ndarrays'? – fgb 2013-05-02 04:51:37

+0

是我使用'np.float()'而不是'float()' – Nyxynyx 2013-05-02 04:52:21

相關問題