刪除重複精度較低

-1

我有一個帶有字符串列和浮點數列的pandas DataFrame我想用drop_duplicates刪除重複項。有些重複的部分並不完全一樣，因爲在小數位低位有一些細微差別。如何刪除重複精度較低的重複項？刪除重複精度較低

例子：

import pandas as pd 
df = pd.DataFrame.from_dict({'text': ['aaa','aaa','aaa','bb'], 'result': [1.000001,1.000000,2,2]}) 
df 
    result text 
0 1.000001 aaa 
1 1.000000 aaa 
2 2.000000 aaa 
3 2.000000 bb

我想獲得

df_out = pd.DataFrame.from_dict({'text': ['aaa','aaa','bb'], 'result': [1.000001,2,2]}) 
df_out 
    result text 
0 1.000001 aaa 
1 2.000000 aaa 
2 2.000000 bb

來源

2017-05-29 Make42

Binning是一個針對此問題的過於複雜的解決方案，但我仍然會共享一個鏈接：https：//chrisalbon.com/python/pandas_binning_data.html –

可以爲了圓你的DF使用功能round與給定精度。

DataFrame.round（小數= 0，*指定參數時，** kwargs）

回合數據幀到小數位的數目可變。

例如，您可以通過這個應用輪兩位小數：

df = df.round(2)

你也可以把它在特定列，例如：

df = df.round({'result': 2})

四捨五入後可以使用功能drop_duplictes

來源

2017-05-29 14:50:47

一輪他們

df.loc[df.round().drop_duplicates().index] 

    result text 
0 1.000001 aaa 
2 2.000000 aaa 
3 2.000000 bb

來源

2017-05-29 14:47:35

使用numpy.trunc來獲得您正在尋找的精度。使用pandasduplicated來查找要保留哪些。

df[~df.assign(result=np.trunc(df.result.values * 100)).duplicated()]

來源

2017-05-29 15:00:13 piRSquared

刪除重複精度較低

回答

相關問題