2016-04-03 54 views
1

正常化的數值列我試圖用下面的代碼在python數據幀正常化柱:Python的數據幀:使用lambda

df['x_norm'] = df.apply(lambda x: (x['X'] - x['X'].mean())/(x['X'].max() - x['X'].min()),axis=1) 

但得到了以下錯誤:

df['x_norm'] = df.apply(lambda x: (x['X'] - x['X'].mean())/(x['X'].max() - x['X'].min()),axis=1) 
AttributeError: ("'float' object has no attribute 'mean'", u'occurred at index 0') 

有誰知道我在這裏錯過了什麼?謝謝!

+0

請問您能否提供樣本數據集(5-7行)和預期輸出? – MaxU

回答

0

我假設你正在使用熊貓

除了應用於整個DataFrame(Documentation)之外,還可以預先計算平均值,最大值和最小值。事情是這樣的:

avg = df['X'].mean() 
diff = df['X'].max() - df['X'].min() 
new_df = df['X'].apply(lambda x: (x-avg)/diff) 

如果您正在尋找正常化整個數據幀校驗這個answer

df.apply(lambda x: (x - np.mean(x))/(np.max(x) - np.min(x))) 
0

如果你想在X列正常化值:

df['x_norm'] = df.X.div(df.X.sum()) 

的步驟步驟:

In [65]: df 
Out[65]: 
    a b X 
0 2 1 5 
1 1 4 5 
2 7 4 7 
3 1 6 6 
4 5 5 8 
5 5 8 2 
6 6 7 5 
7 8 2 5 
8 7 9 9 
9 9 6 5 

In [68]: df['x_norm'] = df.X.div(df.X.sum()) 

In [69]: df 
Out[69]: 
    a b X x_norm 
0 2 1 5 0.087719 
1 1 4 5 0.087719 
2 7 4 7 0.122807 
3 1 6 6 0.105263 
4 5 5 8 0.140351 
5 5 8 2 0.035088 
6 6 7 5 0.087719 
7 8 2 5 0.087719 
8 7 9 9 0.157895 
9 9 6 5 0.087719 

檢查:

In [70]: df.x_norm.sum() 
Out[70]: 1.0