如何比較熊貓數據框的列中可用的十進制數？

我想比較兩列熊貓數據框中可用的十進制值。如何比較熊貓數據框的列中可用的十進制數？

我有一個數據幀：

data = {'AA' :{0:'-14.35',1:'632.0',2:'619.5',3:'352.35',4:'347.7',5:'100'}, 
     'BB' :{0:'-14.3500',1:'632.0000',2:'619.5000',3:'352.3500',4:'347.7000',5:'200'} 
     } 
df1 = pd.DataFrame(data) 
print df1

數據幀像這樣：

 AA  BB 
0 -14.35 -14.3500 
1 632.0 632.0000 
2 619.5 619.5000 
3 352.35 352.3500 
4 347.7 347.7000 
5 100 200

我想比較AA和BB列。如上面的數據框所示，除了第5行行，兩列的值都相同。唯一的問題是尾隨零。

如果兩個AA和BB列是相同的，然後我想在第三列這些比較作爲Result即True或False的結果。

預期結果：

 AA  BB Result 
0 -14.35 -14.35 True 
1 632.0 632.0  True 
2 619.5 619.5  True 
3 352.35 352.35 True 
4 347.7 347.7  True 
5 100 200  False

我如何比較這些十進制值？

來源

2016-09-20 kit

您需要通過astype投列float，然後比較列，因爲列值的type是string。然後使用mask和條件使用布爾列Result：

print (type(df1.ix[0,'AA'])) 
<class 'str'> 

print (type(df1.ix[0,'BB'])) 
<class 'str'> 

df1['Result'] = df1.AA.astype(float) == df1.BB.astype(float) 
df1.BB = df1.BB.mask(df1.Result,df1.AA) 
print (df1) 
     AA  BB Result 
0 -14.35 -14.35 True 
1 632.0 632.0 True 
2 619.5 619.5 True 
3 352.35 352.35 True 
4 347.7 347.7 True 
5  100  200 False

另一種解決方案與ix：

df1['Result'] = df1.AA.astype(float) == df1.BB.astype(float) 
df1.ix[df1.Result, 'BB'] = df1.AA 
print (df1) 
     AA  BB Result 
0 -14.35 -14.35 True 
1 632.0 632.0 True 
2 619.5 619.5 True 
3 352.35 352.35 True 
4 347.7 347.7 True 
5  100  200 False

時序：

#len(df) = 6k 
df1 = pd.concat([df1]*1000).reset_index(drop=True) 

In [31]: %timeit df1.ix[df1.Result, 'BB'] = df1.AA 
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 1.19 ms per loop 

In [33]: %timeit df1.BB = df1.BB.mask(df1.Result,df1.AA) 
1000 loops, best of 3: 900 µs per loop

來源

2016-09-20 05:37:14 jezrael

@ jezrael-請看看我的預期的結果。如果兩列相同，那麼我想BB列與AA相同。 – kit

對不起，給我一下。 – jezrael

請檢查編輯。 – jezrael

如何比較熊貓數據框的列中可用的十進制數？

回答

相關問題