我試圖使用Scikit-learn的分層隨機拆分拆分示例數據集。我也跟着上所示的例子Scikit學習文檔heresklearn.cross_validation.StratifiedShuffleSplit - 錯誤:「索引超出範圍」
import pandas as pd
import numpy as np
# UCI's wine dataset
wine = pd.read_csv("https://s3.amazonaws.com/demo-datasets/wine.csv")
# separate target variable from dataset
target = wine['quality']
data = wine.drop('quality',axis = 1)
# Stratified Split of train and test data
from sklearn.cross_validation import StratifiedShuffleSplit
sss = StratifiedShuffleSplit(target, n_iter=3, test_size=0.2)
for train_index, test_index in sss:
xtrain, xtest = data[train_index], data[test_index]
ytrain, ytest = target[train_index], target[test_index]
# Check target series for distribution of classes
ytrain.value_counts()
ytest.value_counts()
然而,一旦運行該腳本,我得到以下錯誤:
IndexError: indices are out-of-bounds
可能有人請指出我在做什麼錯在這裏?謝謝!
看起來你的索引錯誤應該發生在這裏:'xtrain,xtest = data [train_index],data [test_index]'。如果是這樣,你可以編輯你的問題,以幫助其他人找到問題。 – Scott