使用熊貓和Sklearn.Neighbors

我想在數據框上使用Python 3.5/Pandas/Sklearn.neighbors來適應KNN模型。我導入了數據，將其分解爲訓練和測試數據和標籤，但是當我嘗試預測使用它時，出現以下錯誤。我對熊貓很新，所以任何幫助將不勝感激，謝謝！使用熊貓和Sklearn.Neighbors

import pandas as pd 
from sklearn import cross_validation 
import numpy as np 
from sklearn.neighbors import KNeighborsRegressor 
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species']) 
data = seeds.iloc[:,[0,1,2,3,4,5,6]] 
labels = seeds.iloc[:,[7]] 
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1) 
knn = KNeighborsRegressor(n_neighbors=30) 
knn.fit(x_train,y_train) 
knn.predict(x_test) 

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-121-2292e64e5ab8> in <module>() 
----> 1 knn.predict(x_test) 

C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X) 
    151 
    152   if weights is None: 
--> 153    y_pred = np.mean(_y[neigh_ind], axis=1) 
    154   else: 
    155    y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float) 

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims) 
    2876 
    2877  return _methods._mean(a, axis=axis, dtype=dtype, 
-> 2878       out=out, keepdims=keepdims) 
    2879 
    2880 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims) 
    66  if isinstance(ret, mu.ndarray): 
    67   ret = um.true_divide(
---> 68     ret, rcount, out=ret, casting='unsafe', subok=False) 
    69  elif hasattr(ret, 'dtype'): 
    70   ret = ret.dtype.type(ret/rcount) 

TypeError: unsupported operand type(s) for /: 'str' and 'int'

來源

2016-09-25 ConstantinL

你使用的是迴歸的，所以標籤必須是數字，而不是字符串---所以你必須進行編碼後，如果即使是有道理的......難道你確定你想要一個kNN迴歸器而不是一個kNN分類器？ –

你應該使用KNeighborsClassifier此KNN。您正試圖預測標籤Species的分類。上面代碼中的迴歸器試圖訓練和預測連續有價值的數值變量，這是您的問題引入的地方。

from sklearn.neighbors import KNeighborsClassifier 
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species']) 
data = seeds.iloc[:,[0,1,2,3,4,5,6]] 
labels = seeds.iloc[:,[7]] 
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1) 
knn = KNeighborsClassifier(n_neighbors=30)

http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html

這裏是迴歸量將相比於分類（你要使用）繪製。