2017-08-05 35 views
1

試圖創建一個新數據幀的第一劈裂原來的兩個:大熊貓在列變更值選定行

DF1 - 包含從選定colomn具有從給定的列表

值原始幀只排

df2 - 僅包含來自原始行的選定colomn中具有其他值的行,然後將這些值更改爲新的給定值。

返回新的數據幀爲DF1和DF2的級聯

這工作得很好:

l1 = ['a','b','c','d','a','b'] 
l2 = [1,2,3,4,5,6] 
df = pd.DataFrame({'cat':l1,'val':l2}) 
print(df) 

cat val 
0 a 1 
1 b 2 
2 c 3 
3 d 4 
4 a 5 
5 b 6 

df['cat'] = df['cat'].apply(lambda x: 'other') 
print(df) 

    cat val 
0 other 1 
1 other 2 
2 other 3 
3 other 4 
4 other 5 
5 other 6 

然而,當我定義功能:

def create_df(df, select, vals, other): 
    df1 = df.loc[df[select].isin(vals)] 
    df2 = df.loc[~df[select].isin(vals)] 
    df2[select] = df2[select].apply(lambda x: other) 
    result = pd.concat([df1, df2]) 
    return result 

,並稱之爲:

df3 = create_df(df, 'cat', ['a','b'], 'xxx') 
print(df3) 

這會導致什麼,我真的需要:

cat val 
0 a 1 
1 b 2 
4 a 5 
5 b 6 
2 xxx 3 
3 xxx 4 

出於某種原因,在這種情況下,我得到一個警告:

..\usr\conda\lib\site-packages\ipykernel\__main__.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame. 
Try using .loc[row_indexer,col_indexer] = value instead 

那麼如何這種情況下(當我將值分配給函數中的列)與第一個不同,當我賦值不在函數中時?

什麼是改變列值的正確方法?

+3

的可能的複製[?如何處理與熊貓SettingWithCopyWarning(https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-熊貓) –

+0

儘管如此,奇怪的是,在我的情況下,這個特性在代碼的不同部分顯示不同:我在函數定義中得到警告,但在主程序中沒有。這是爲什麼? – dokondr

回答

0

那麼有很多方法的代碼可以優化我猜想,但它的工作,你可以簡單地保存輸入的數據幀的副本和Concat的那些:

def create_df(df, select, vals, other): 
    df1 = df.copy()[df[select].isin(vals)] #boolean.index 
    df2 = df.copy()[~df[select].isin(vals)] #boolean-index 
    df2[select] = other # this is sufficient 
    result = pd.concat([df1, df2]) 
    return result 

替代版本:

l1 = ['a','b','c','d','a','b'] 
l2 = [1,2,3,4,5,6] 
df = pd.DataFrame({'cat':l1,'val':l2}) 

# define a mask 
mask = df['cat'].isin(list("ab")) 

# concatenate mask, nonmask 
df2 = pd.concat([df[mask],df[-mask]]) 

# change values to 'xxx' 
df2.loc[-mask,["cat"]] = "xxx" 

輸出

cat val 
0 a 1 
1 b 2 
4 a 5 
5 b 6 
2 xxx 3 
3 xxx 4 

或功能:

def create_df(df, filter_, isin_, value): 

    # define a mask 
    mask = df[filter_].isin(isin_) 

    # concatenate mask, nonmask 
    df = pd.concat([df[mask],df[-mask]]) 

    # change values to 'xxx' 
    df.loc[-mask,[filter_]] = value 

    return df 

df2 = create_df(df, 'cat', ['a','b'], 'xxx') 
df2