reindex multiindex pandas數據框

我竭力試圖找出如何做的大熊貓的下一個操作：

我有類似下面站的時間戳csv文件：

接下來的事情我做的是以下pivot_table使用熊貓：

trips.pivot_table('bike', aggfunc='count', 
         index=['date', 'hour'], 
         columns='station_arrived').fillna(0)

返回是這樣的：

enter image description here

我的問題是這樣的：

我想重新索引「小時」列有索引，每天從0到23小時，即使有不計數的那一天。

否則重新索引只有一個指標是容易的，但事情變得複雜了，我要在這一個多指標數據幀

有沒有什麼辦法做到這一點？

來源

2016-03-01 ghost

我認爲在創建數據透視表之前，您需要創建空行。所以這將涉及到一個方法來檢查，爲每個索引，哪些小時丟失，然後爲該索引生成缺失小時0 /空值的行。那麼創建樞軸。 – Sam

import datetime as dt 
import pandas as pd 
from pandas import Timestamp 

df = pd.DataFrame(
    {'action': ['C', 'C', 'C', 'C', 'C', 'A', 'C'], 
    'bike': [89, 89, 57, 29, 76, 69, 17], 
    'cust_id': [6, 6, 30, 30, 30, 30, 30], 
    'date': [Timestamp('2010-02-02 00:00:00'), 
       Timestamp('2010-02-02 00:00:00'), 
       Timestamp('2010-02-05 00:00:00'), 
       Timestamp('2010-02-05 00:00:00'), 
       Timestamp('2010-02-05 00:00:00'), 
       Timestamp('2010-02-05 00:00:00'), 
       Timestamp('2010-02-05 00:00:00')], 
    'date_arrived': [Timestamp('2010-02-02 14:27:00'), 
         Timestamp('2010-02-02 15:42:00'), 
         Timestamp('2010-02-05 12:06:00'), 
         Timestamp('2010-02-05 12:07:00'), 
         Timestamp('2010-02-05 13:11:00'), 
         Timestamp('2010-02-05 13:14:00'), 
         Timestamp('2010-02-05 13:45:00')], 
    'date_removed': [Timestamp('2010-02-02 13:57:00'), 
         Timestamp('2010-02-02 15:12:00'), 
         Timestamp('2010-02-05 11:36:00'), 
         Timestamp('2010-02-05 11:37:00'), 
         Timestamp('2010-02-05 12:41:00'), 
         Timestamp('2010-02-05 12:44:00'), 
         Timestamp('2010-02-05 13:15:00')], 
    'hour': [14, 15, 12, 12, 13, 13, 13], 
    'station_arrived': [56, 56, 85, 85, 85, 85, 85], 
    'station_removed': [56, 56, 85, 85, 85, 85, 85]})

首先，創建一個小時指數涵蓋的日期範圍：

idx = pd.date_range(df.date.min(), df.date.max() + dt.timedelta(days=1), freq='H')

現在，你希望有一個日期時間指數，因此它設置爲「date_arrived」。然後使用groupby與TimeGrouper分組在小時和station_arrived。 count值非空值station_arrived。取消堆疊結果以獲得數據透視表格式的數據。

最後，使用reindex在新的小時間隔idx索引上設置索引，並用零填充空值。

>>> (df 
    .set_index('date_arrived') 
    .groupby([pd.TimeGrouper('H'), 'station_arrived']) 
    .station_arrived 
    .count() 
    .unstack() 
    .reindex(idx) 
    .fillna(0) 
    ) 
station_arrived  56 85 
2010-02-02 00:00:00 0 0 
2010-02-02 01:00:00 0 0 
2010-02-02 02:00:00 0 0 
2010-02-02 03:00:00 0 0 
2010-02-02 04:00:00 0 0 
2010-02-02 05:00:00 0 0 
2010-02-02 06:00:00 0 0 
2010-02-02 07:00:00 0 0 
2010-02-02 08:00:00 0 0 
2010-02-02 09:00:00 0 0 
2010-02-02 10:00:00 0 0 
2010-02-02 11:00:00 0 0 
2010-02-02 12:00:00 0 0 
2010-02-02 13:00:00 0 0 
2010-02-02 14:00:00 1 0 
2010-02-02 15:00:00 1 0 
2010-02-02 16:00:00 0 0 
...

來源

2016-03-01 01:50:02 Alexander

非常迷人而又奇特的方法！ https://stackoverflow.com/questions/17287933/filling-in-date-gaps-in-multiindex-pandas-dataframe?rq=1 指着拆散：我是從這個鏈接閱讀類似的方法可能是一個解決方案，但你剛剛解決它完美。謝謝！抱歉無法升級您的解決方案。沒有足夠的分數來這樣做。 – ghost

reindex multiindex pandas數據框

回答

相關問題