蟒蛇大熊貓迴歸

標準化專欄中，我有以下DF：蟒蛇大熊貓迴歸

Date  Event_Counts Category_A Category_B 
20170401  982457   0   1 
20170402  982754   1   0 
20170402  875786   0   1

我正在準備中的數據進行迴歸分析，並要規範列Event_Counts，因此，它是一個類似鱗片狀的類別。

我使用下面的代碼：

from sklearn import preprocessing 
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])

雖然我得到這樣的警告：

DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function. 
    warnings.warn(msg, _DataConversionWarning)

似乎已經奏效;有一個新的專欄。然而，它有-1.3的負數

我認爲尺度函數的作用是從數字中減去平均值，並將其除以每一行的標準偏差;然後將結果的最小值添加到每一行。

這對熊貓來說不適用嗎？或者我應該使用normalize（）函數還是StandardScaler（）函數？我想有0分的標準化列1

謝謝

來源

2017-04-17 jeangelj

我認爲您在尋找sklearn.preprocessing.MinMaxScaler。這將允許您縮放到一個給定的範圍。

所以你的情況這將是：

scaler = preprocessing.MinMaxScaler(feature_range=(0,1)) 
df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])

要縮放整個DF：

scaled_df = scaler.fit_transform(df) 
print(scaled_df) 
[[ 0.   0.99722347 0.   1.  ] 
[ 1.   1.   1.   0.  ] 
[ 1.   0.   0.   1.  ]]

來源

2017-04-17 20:05:25 Grr

有趣！我不知道存在，讓我試試 – jeangelj

謝謝 - 這工作！ – jeangelj

我收到了一個不同列的錯誤DeprecationWarning：傳遞1d數組作爲數據在0.17中被棄用，並將0.19引發ValueError。如果數據具有單個特徵，則使用X.reshape（-1，1）重新整形數據，如果數據包含單個特徵，則使用X.reshape（1，-1）重整數據。 – jeangelj

縮放減去平均值和各功能（列）的標準差除以完成。所以，

scaled_event_counts = (Event_Counts - mean(Event_Counts))/std(Event_Counts)

到float64警告的Int64的來自其減去平均值，這將是一個浮點數，而不僅僅是一個整數。

由於平均值將標準化爲零，因此您將在縮放列中得到負數。

來源

2017-04-17 20:05:13 msitt

謝謝;和預處理的scale（）完全一樣嗎？ – jeangelj

是的。如果你願意，可以在這裏找到源代碼（https://github.com/scikit-learn/scikit-learn/blob/0.18.1/sklearn/preprocessing/data.py#L80）。 – msitt

謝謝 - 現在就讀它 – jeangelj

蟒蛇大熊貓迴歸

回答

相關問題