如何在一個範圍內壓縮數據幀？

所以在這個數據集中，所有在188.415-188.42之間的價格交易都會將它們的交易量加起來，所有188.43交易加在一起，等等。我目前使用熊貓來管理數據，而我不確定我可以通過什麼功能完成此任務。

實施例的數據：

Time|Volume|Price 
09:30:00|200|188.42 
09:30:00|500|188.41 
09:30:00|100|188.415 
09:30:00|100|188.41 
09:30:00|590|188.42 
09:30:00|100|188.415 
09:30:00|100|188.4 
09:30:00|200|188.42 
09:30:00|900|188.41 
09:30:00|249|188.42 
09:30:00|100|188.41 
09:30:00|300|188.415 
09:30:00|300|188.42 
09:30:00|100|188.43 
09:30:00|100|188.44 
09:30:00|900|188.43 
09:30:00|200|188.42 
09:30:00|100|188.43 
09:30:00|100|188.42 
09:30:00|500|188.43

來源

2014-10-18 Samuel

你可以舍入Price柱，將它們存儲在一個（臨時）approx柱，然後執行groupby/agg operation：

df['approx'] = df['Price'].round(2) 
df.groupby('approx')['Volume'].sum()

產生

# approx 
# 188.40  100 
# 188.41 1600 
# 188.42 2339 
# 188.43 1600 
# 188.44  100 
# Name: Volume, dtype: int64

或者，您可以放棄approx列，並直接向df.groupby提供值：

In [142]: df.groupby(df['Price'].round(2))['Volume'].sum() 
Out[142]: 
Price 
188.40  100 
188.41 1600 
188.42 2339 
188.43 1600 
188.44  100 
Name: Volume, dtype: int64

來源

2014-10-18 22:14:39 unutbu

謝謝！我甚至不知道groupby – Samuel 2014-10-18 22:23:37

如何在一個範圍內壓縮數據幀？

回答

相關問題