2017-08-07 94 views
0

我想實現像這樣的高斯NB訓練。然而,如果X的尺寸不相等(即X內的所有列表需要長度相同),則gnb.fit()會引發異常。如果我的訓練樣本是不同長度的向量,調用fit()的正確方法是什麼?高斯NB擬合()函數期望固定長度向量

def train(X, Y): 
    gnb = GaussianNB() 
    gnb.fit(X, Y) 
    return gnb 

>>> X = [[1,2,3], [4,5,6,7], [8,9]] 
>>> Y = [1,1,1] 
>>> snb.train(X, Y) 

/Library/Python/2.7/site-packages/sklearn/utils/validation.py:395: 
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 
and will raise ValueError in 0.19. Reshape your data either using 
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) 
if it contains a single sample. 
DeprecationWarning) 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
File "snb.py", line 113, in train 
gnb.fit(X, Y) 
File "/Library/Python/2.7/site-packages/sklearn/naive_bayes.py", line 
182, in fit 
X, y = check_X_y(X, y) 
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", 
line 521, in check_X_y 
ensure_min_features, warn_on_dtype, estimator) 
File "/Library/Python/2.7/site-packages/sklearn/utils/validation.py", 
line 402, in check_array 
array = array.astype(np.float64) 
ValueError: setting an array element with a sequence. 

回答

0

這是因爲列表X內的列表是相同的長度不能。子列表X充當行/示例,並且該列表中的每個元素都是一個特徵。爲了確保你的模型運行,你需要有相同長度的子列表,否則它將不起作用。我改變了這部分,代碼工作。

def train(X, Y): 
    gnb = GaussianNB() 
    gnb.fit(X, Y) 
    return gnb 

X = [[1,2,3,4], [4,5,6,7], [8,9,10,11]] 
Y = [1,1,1] 
train(X, Y) 
2

你所有的X向量MUST長度相同。高斯樸素貝葉斯估計器被設計爲基於一組因素進行預測。如果每個X中有一個可變數字,分類器如何確定哪個元素屬於哪個因子?

一種選擇是填充X值爲0的矢量,以確保它們的長度都相等。否則,你需要考慮可變的預處理。