如何獲得sklearn非洗牌train_test_split

如果我想要一個隨機火車/測試分裂，我用的是sklearn輔助函數：如何獲得sklearn非洗牌train_test_split

In [1]: from sklearn.model_selection import train_test_split 
    ...: train_test_split([1,2,3,4,5,6]) 
    ...: 
Out[1]: [[1, 6, 4, 2], [5, 3]]

什麼是最簡潔的方式來獲得一個非改組的列車/測試分裂，即

[[1,2,3,4], [5,6]]

編輯目前我使用

train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):]

但希望有更好的東西。我已經打開了sklearn https://github.com/scikit-learn/scikit-learn/issues/8844

EDIT 2個問題：我的PR已經被合併，在scikit學習版本0.19，您可以shuffle=False傳遞參數給train_test_split獲得非改組的分裂。

來源

2017-05-08 maxymoo

使用numpy.split：

import numpy as np 
data = np.array([1,2,3,4,5,6]) 

np.split(data, [4])   # modify the index here to specify where to split the array 
# [array([1, 2, 3, 4]), array([5, 6])]

如果您想按百分比分割，則可以從數據的形狀計算分裂指數：

data = np.array([1,2,3,4,5,6]) 
p = 0.6 

idx = int(p * data.shape[0]) + 1  # since the percentage may end up to be a fractional 
             # number, modify this as you need, usually shouldn't 
             # affect much if data is large 
np.split(data, [idx]) 
# [array([1, 2, 3, 4]), array([5, 6])]

來源

2017-05-08 00:18:24 Psidom

謝謝，這幾乎看起來像我想要的但如果我不知道我想吐的價值？即說我只想做一個60/40分割？ – maxymoo

嗯是的我希望能避免這樣的事情，但也許是不可能在這種情況下，你認爲它可能會更清楚，只要做'data [：int（len（data）* p）]，data [int（len（數據）* p）：]' – maxymoo

是的。這絕對有效。 – Psidom

我不加入除了一個容易複製粘貼功能除了Psidom的答案：

def non_shuffling_train_test_split(X, y, test_size=0.2): 
    i = int((1 - test_size) * X.shape[0]) + 1 
    X_train, X_test = np.split(X, [i]) 
    y_train, y_test = np.split(y, [i]) 
    return X_train, X_test, y_train, y_test

更新：在某些時候，這個功能變得內置的，所以現在你可以這樣做：

from sklearn.model_selection import train_test_split 
train_test_split(X, y, test_size=0.2, shuffle=False)

來源

2017-05-28 09:29:22 Anake

所有你需要做的就是將洗牌參數爲False，分層參數設置爲無：

In [49]: train_test_split([1,2,3,4,5,6],shuffle = False, stratify = None) 
    Out[49]: [[1, 2, 3, 4], [5, 6]]

來源

2017-08-16 04:55:01

嘿實際上mayank' stratify = None'是默認的（請參閱原始問題中的「編輯2」） – maxymoo

如何獲得sklearn非洗牌train_test_split

回答

相關問題