2017-08-12 58 views
0
saleprice_scaled =/
StandardScaler().fit_transform(df_train['SalePrice'][:,np.newaxis]); 

任何人都可以請解釋這條線發生了什麼? 爲什麼在這裏使用newaxis? 雖然我知道使用newaxis,但我無法弄清楚它在這種特殊情況下的使用。numpy np.newaxis

由於事先

回答

3

df_train['SalePrice']是Pandas.Series的形狀(矢量/ 1D陣列):(N個元件)

現代(版本:0.17+)SKLearn方法不喜歡1D數組(向量),他們期望二維數組。

df_train['SalePrice'][:,np.newaxis] 

將一維數組(形狀:N個元素)轉換爲二維數組(形狀:N行,1列)。

演示:

In [21]: df = pd.DataFrame(np.random.randint(10, size=(5, 3)), columns=list('abc')) 

In [22]: df 
Out[22]: 
    a b c 
0 4 3 8 
1 7 5 6 
2 1 3 9 
3 7 5 7 
4 7 0 6 

In [23]: from sklearn.preprocessing import StandardScaler 

In [24]: df['a'].shape 
Out[24]: (5,)  # <--- 1D array 

In [25]: df['a'][:, np.newaxis].shape 
Out[25]: (5, 1) # <--- 2D array 

有熊貓的方式做同樣的:

In [26]: df[['a']].shape 
Out[26]: (5, 1) # <--- 2D array 

In [27]: StandardScaler().fit_transform(df[['a']]) 
Out[27]: 
array([[-0.5 ], 
     [ 0.75], 
     [-1.75], 
     [ 0.75], 
     [ 0.75]]) 

如果我們將通過一維數組,會發生什麼:

In [28]: StandardScaler().fit_transform(df['a']) 
C:\Users\Max\Anaconda4\lib\site-packages\sklearn\utils\validation.py:429: DataConversionWarning: Data with input dtype int32 was converted t 
o float64 by StandardScaler. 
    warnings.warn(msg, _DataConversionWarning) 
C:\Users\Max\Anaconda4\lib\site-packages\sklearn\preprocessing\data.py:586: DeprecationWarning: Passing 1d arrays as data is deprecated in 0 
.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) 
if it contains a single sample. 
    warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning) 
C:\Users\Max\Anaconda4\lib\site-packages\sklearn\preprocessing\data.py:649: DeprecationWarning: Passing 1d arrays as data is deprecated in 0 
.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) 
if it contains a single sample. 
    warnings.warn(DEPRECATION_MSG_1D, DeprecationWarning) 
Out[28]: array([-0.5 , 0.75, -1.75, 0.75, 0.75])