2017-06-14 87 views
0

我是該領域的初學者,試圖按照邏輯迴歸對數據集進行建模。代碼如下:發現輸入變量的樣本數不一致[100,300]

import numpy as np 
from matplotlib import pyplot as plt 
import pandas as pnd 
from sklearn.preprocessing import Imputer, LabelEncoder, OneHotEncoder, StandardScaler 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import confusion_matrix 

# Import the dataset 
data_set = pnd.read_csv("/Users/Siddharth/PycharmProjects/Deep_Learning/Classification Template/Social_Network_Ads.csv") 
X = data_set.iloc[:, [2,3]].values 
Y = data_set.iloc[:, 4].values 

# Splitting the set into training set and testing set 
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=0) 

# Scaling the variables 
scaler_x = StandardScaler() 
x_train = scaler_x.fit_transform(x_train) 
x_train = scaler_x.transform(x_test) 

# Fitting Linear Regression to training data 
classifier = LogisticRegression(random_state=0) 
classifier.fit(x_train, y_train) 

# Predicting the test set results 
y_prediction = classifier.predict(x_test) 

# Making the confusion matrix 
conMat = confusion_matrix(y_true=y_test, y_pred=y_prediction) 
print(conMat) 

我得到的錯誤是在classifier.fit(x_train, y_train)。 錯誤是:

Traceback (most recent call last): 
    File "/Users/Siddharth/PycharmProjects/Deep_Learning/Logistic_regression.py", line 24, in <module> 
    classifier.fit(x_train, y_train) 
    File "/usr/local/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1173, in fit 
    order="C") 
    File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y 
    check_consistent_length(X, y) 
    File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length 
    " samples: %r" % [int(l) for l in lengths]) 
ValueError: Found input variables with inconsistent numbers of samples: [100, 300] 

我不知道爲什麼會這樣。任何幫助將不勝感激。 謝謝!

回答

1

好像你在這裏有一個錯字。您可能需要:

x_test = scaler_x.transform(x_test) 

而不是:x_train = scaler_x.transform(x_test)。總之,錯誤基本上說你的x_train(這實際上是x_test)和y_train的大小不匹配。

相關問題