logistic迴歸分析，測試設置和火車站設置

file = pd.DataFrame({'name':['s', 'k', 'lo', 'ki'] , 'age':[12, 23, 32, 22], 'marks':[34, 34, 43, 22], 'score':[1, 1, 0, 1]})

我想運行使用下面的命令迴歸：logistic迴歸分析，測試設置和火車站設置

import statsmodels.formula.api as smf 
logit = smf.logit('score ~ age + marks', file) 
results = logit.fit()

但我得到一個錯誤：

"statsmodels.tools.sm_exceptions.PerfectSeparationError: 
Perfect separation detected, results not available".

我會還將數據分成訓練集和測試集，我該如何做？此後我必須使用預測命令。

R中的「glm」命令比Python更容易。

來源

2015-02-10 Shiva Prakash

當我處理一些數據時，我也遇到過類似的錯誤。這是由於數據的屬性。由於兩組（分數= 0和分數= 1）在您的數據中完全分開，所以決策邊界可以在任何地方（無限解）。所以圖書館無法返回一個解決方案。這FIGURE顯示您的數據。解決方案1,2,3都是有效的。

我在Matlab中使用glmnet執行此操作。從Matlab錯誤讀取：

Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.

使用更多的數據點將有所幫助。

有趣的是，scikit-learn的LogisticRegression似乎沒有任何抱怨。

使用scikit學習您的問題

示例代碼：

import pandas as pd 
import numpy as np 
from patsy import dmatrices 
from sklearn.linear_model import LogisticRegression 

file = pd.DataFrame({'name':['s', 'k', 'lo', 'ki'] , 'age':[12, 23, 32, 22], 'marks':[34, 34, 43, 22], 'score':[1, 1, 0, 1]}) 
# Prepare the data 
y,X = dmatrices('score ~ age + marks',file) 
y = np.ravel(y) 
# Fit the data to Logistic Regression model 
model = LogisticRegression() 
model = model.fit(X,y)

對於拆分數據爲訓練和測試，你可能要參考這個： http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

來源

2015-03-10 04:49:07 Shivster

logistic迴歸分析，測試設置和火車站設置

回答

相關問題