2017-09-22 55 views
1

我想創建一個新的列在數據類型爲float的列上應用多個條件。創建新的列,同時迭代通過熊貓數據集(多個條件)

Sample data: 
ID CO 
0  12.0 
1  11.0 
2   8.0 
3   6.5 
4   5.5 
5   5.7 
6   5.8 
7   6.5 
8   6.8 

for index, row in df.iterrows(): 
    if row['CO'] in arange(0,1.54): 
     row.loc['CO_1'] = 'GOOD' 
    elif row['CO'] in arange(1.54,1.70): 
     row.loc['CO_1'] = 'MOD' 

以上沒有工作,所以我試着寫一個單獨的功能:

def aqi_CO(row): 
    val_1=0 
    for x in row: 
     if x in arange(0,0.054): 
      val_1 = 'GOOD' 
     elif x in arange(0.054,0.070): 
      val_1 = 'MODERATE' 
     elif x in arange(0.070,0.085): 
      val_1 = 'UNHEALTHY_SG' 
     elif x in arange(0.085,0.105): 
      val_1 = 'UNHEALTHY' 
     elif x in arange(0.105,0.200): 
      val_1 = 'VERY_UNHEALTHY' 
     elif x in arange(0.200,3): 
      val_1 = 'HAZARDOUS' 
     return val_1 

並把它稱爲應用:

df['aqi_CO'] = df.apply(lambda x: aqi_CO(df['CO']), axis=1) 

這並沒有爲-很好地工作。我現在困惑,有人可以幫助我,我應該如何添加新的列遍歷數據幀逐行,並檢查3,4條件創建新的列。

+0

使用'pd.cut'和拉布勒 – Wen

+0

「沒有工作」是不明確:沒這導致錯誤(請出示回溯),它是否會導致意外的輸出(請出示預期和意外的輸出),或它什麼都不做? – Evert

+0

'如果row ['CO']在arange(0,1.54):'可能不會做你想要的。如果你想'如果0 Evert

回答

1

通過使用pd.cut

pd.cut(df.CO,bins=[0,2,4,6,8,9,100],labels=["GOOD","MODERATE","UNHEALTHY_SG","UNHEALTHY","VERY_UNHEALTHY","HAZARDOUS"]) 

Out[866]: 
0  HAZARDOUS 
1  HAZARDOUS 
2  UNHEALTHY 
3  UNHEALTHY 
4 UNHEALTHY_SG 
5 UNHEALTHY_SG 
6 UNHEALTHY_SG 
7  UNHEALTHY 
8  UNHEALTHY 
Name: CO, dtype: category 

df['new']=pd.cut(df.CO,bins=[0,2,4,6,8,9,100],labels=["GOOD","MODERATE","UNHEALTHY_SG","UNHEALTHY","VERY_UNHEALTHY","HAZARDOUS"]) 
df 
Out[868]: 
    ID CO   new 
0 0 12.0  HAZARDOUS 
1 1 11.0  HAZARDOUS 
2 2 8.0  UNHEALTHY 
3 3 6.5  UNHEALTHY 
4 4 5.5 UNHEALTHY_SG 
5 5 5.7 UNHEALTHY_SG 
6 6 5.8 UNHEALTHY_SG 
7 7 6.5  UNHEALTHY 
8 8 6.8  UNHEALTHY 
0

在你的代碼的第一個片段: arange(0,1.54)回報array([ 0., 1.])並從樣本數據沒有就在於它。但是,如果您想要 檢查,則可以增加範圍和步長。 喜歡的東西arange(0, 7, 0.1)然後,爲下一步在for循環中,您使用.locindexdataframe代替rowdf.loc[index,'CO_1'] = 'GOOD'代替row.loc['CO_1'] = 'GOOD'

for index, row in df.iterrows(): 
    if row['CO'] in arange(0, 7, 0.1): 
     df.loc[index,'CO_1'] = 'GOOD' 
    elif row['CO'] in arange(1.54,1.70): 
     df.loc[index,'CO_1'] = 'MOD' 

結果:

 CO CO_1 
ID    
0 12.0 NaN 
1 11.0 NaN 
2 8.0 NaN 
3 6.5 GOOD 
4 5.5 GOOD 
5 5.7 GOOD 
6 5.8 NaN 
7 6.5 GOOD 
8 6.8 NaN 

同樣,對於代碼的第二部分,可能使用lambda並僅應用於列:

df['aqi_CO'] = df['CO'].apply(lambda x: aqi_CO(x)) 

現在,由於僅列值被傳遞它可以在不迭代在功能檢查(注:第一種情況下的功能的範圍改變,從而,該輸出可以看出):

def aqi_CO(x): 
    val_1=0 

    if x in arange(0,7, 0.1): 
     val_1 = 'GOOD' 
    elif x in arange(0.054,0.070): 
     val_1 = 'MODERATE' 
    elif x in arange(0.070,0.085): 
     val_1 = 'UNHEALTHY_SG' 
    elif x in arange(0.085,0.105): 
     val_1 = 'UNHEALTHY' 
    elif x in arange(0.105,0.200): 
     val_1 = 'VERY_UNHEALTHY' 
    elif x in arange(0.200,3): 
     val_1 = 'HAZARDOUS' 
    return val_1 

結果:

 CO aqi_CO 
ID    
0 12.0  0 
1 11.0  0 
2 8.0  0 
3 6.5 GOOD 
4 5.5 GOOD 
5 5.7 GOOD 
6 5.8  0 
7 6.5 GOOD 
8 6.8  0 
相關問題