2017-03-02 72 views
1

從從csv文件加載初始數據幀,上過濾大熊貓創建列數據框

df = pd.read_csv("file.csv",sep=";") 

我得到一個過濾副本

df_filtered = df[df["filter_col_name"]== value] 

然而,使用diff()方法創建一個新的列時,

df_filtered["diff"] = df_filtered["feature"].diff() 

我收到以下警告:

/usr/local/bin/ipython3:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame. 
Try using .loc[row_indexer,col_indexer] = value instead 

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 
    #!/usr/bin/python3 

我還注意到處理時間很長。令人驚訝的是(對我來說......),如果我在非過濾的DataFrame上做同樣的事情,我運行良好。

我應該如何繼續在過濾的數據上創建「diff」列?

回答

1

您需要copy

如果您在df_filtered修改值以後你會發現,修改不會傳播回原始數據(df),而大熊貓也警告。

#need process sliced df, return sliced df 
df_filtered = df[df["filter_col_name"]== value].copy() 

或者:

#need process sliced df, return all df 
df.loc[df["filter_col_name"]== value, 'feature'] = 
df.loc[df["filter_col_name"]== value , 'feature'].diff() 

樣品:

df = pd.DataFrame({'filter_col_name':[1,1,3], 
        'feature':[4,5,6], 
        'C':[7,8,9], 
        'D':[1,3,5], 
        'E':[5,3,6], 
        'F':[7,4,3]}) 

print (df) 
    C D E F feature filter_col_name 
0 7 1 5 7  4    1 
1 8 3 3 4  5    1 
2 9 5 6 3  6    3 
value = 1 

df_filtered = df[df["filter_col_name"]== value].copy() 
df_filtered["diff"] = df_filtered["feature"].diff() 
print (df_filtered) 
    C D E F feature filter_col_name diff 
0 7 1 5 7  4    1 NaN 
1 8 3 3 4  5    1 1.0 

value = 1 

df.loc[df["filter_col_name"]== value, 'feature'] = 
df.loc[df["filter_col_name"]== value , 'feature'].diff() 

print (df) 
    C D E F feature filter_col_name 
0 7 1 5 7  NaN    1 
1 8 3 3 4  1.0    1 
2 9 5 6 3  6.0    3 
0

嘗試ü唱歌

df_filtered.loc[:, "diff"] = df_filtered["feature"].diff()