我認爲你需要numpy.select
- 它首先選擇True
值和所有其他都不重要:
m1 = df['feature1']==1
m2 = df['feature2']==1
m3 = df['feature3']==1
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
樣品:
customer_id = [1,2,3,4,5,6,7,8,9,10]
feature1 = [0,0,1,1,0,0,1,1,0,0]
feature2 = [1,0,1,0,1,0,1,0,1,0]
feature3 = [0,0,1,0,0,0,1,0,0,0]
df = pd.DataFrame({'customer_id':customer_id,
'feature1':feature1,
'feature2':feature2,
'feature3':feature3})
m1 = df['feature1']==1
m2 = df['feature2']==1
m3 = df['feature3']==1
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
customer_id feature1 feature2 feature3 new_var
0 1 0 1 0 2
1 2 0 0 0 4
2 3 1 1 1 1
3 4 1 0 0 1
4 5 0 1 0 2
5 6 0 0 0 4
6 7 1 1 1 1
7 8 1 0 0 1
8 9 0 1 0 2
9 10 0 0 0 4
如果features
僅1
和0
可轉換0
到False
和1
到True
:
m1 = df['feature1'].astype(bool)
m2 = df['feature2'].astype(bool)
m3 = df['feature3'].astype(bool)
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
customer_id feature1 feature2 feature3 new_var
0 1 0 1 0 2
1 2 0 0 0 4
2 3 1 1 1 1
3 4 1 0 0 1
4 5 0 1 0 2
5 6 0 0 0 4
6 7 1 1 1 1
7 8 1 0 0 1
8 9 0 1 0 2
9 10 0 0 0 4
只是爲了某種回答我的問題:我剛纔提到的東西我也試過np.where解決方案的工作 - 在因爲它沒有給我正確的結果是因爲feature1的數據類型是字符串,而不是整數..所以對於任何尋找類似問題的人來說,'nested np.where'解決方案和'numpy.select'解決方案jezrael提到作品 – Shraddha