2017-11-10 144 views
1

我有一個數據幀,它有幾天的聚合數據。我想在失蹤幾天添加熊貓在時間序列中填寫缺失日期

我下面的另一篇文章,Add missing dates to pandas dataframe,不幸的是,它重寫了我的結果(也許功能微微一變?)...的代碼如下

import random 
import datetime as dt 
import numpy as np 
import pandas as pd 

def generate_row(year, month, day): 
    while True: 
     date = dt.datetime(year=year, month=month, day=day) 
     data = np.random.random(size=4) 
     yield [date] + list(data) 

# days I have data for 
dates = [(2000, 1, 1), (2000, 1, 2), (2000, 2, 4)] 
generators = [generate_row(*date) for date in dates] 

# get 5 data points for each 
data = [next(generator) for generator in generators for _ in range(5)] 

df = pd.DataFrame(data, columns=['date'] + ['f'+str(i) for i in range(1,5)]) 

# df 
groupby_day = df.groupby(pd.PeriodIndex(data=df.date, freq='D')) 
results = groupby_day.sum() 

idx = pd.date_range(min(df.date), max(df.date)) 
results.reindex(idx, fill_value=0) 

結果前填充缺失的日期指數
enter image description here

結果後
enter image description here

+1

也許你正在尋找重採樣呢? –

+0

它看起來很有前途,但我努力從文檔中應用它 – Alter

+0

我想我明白了...... 'df.set_index(df.date,inplace = True)'+'df = df.resample('D ').sum()' 這很方便 – Alter

回答

3

您需要使用period_range而非date_range

In [11]: idx = pd.period_range(min(df.date), max(df.date)) 
    ...: results.reindex(idx, fill_value=0) 
    ...: 
Out[11]: 
        f1  f2  f3  f4 
2000-01-01 2.049157 1.962635 2.756154 2.224751 
2000-01-02 2.675899 2.587217 1.540823 1.606150 
2000-01-03 0.000000 0.000000 0.000000 0.000000 
2000-01-04 0.000000 0.000000 0.000000 0.000000 
2000-01-05 0.000000 0.000000 0.000000 0.000000 
2000-01-06 0.000000 0.000000 0.000000 0.000000 
2000-01-07 0.000000 0.000000 0.000000 0.000000 
2000-01-08 0.000000 0.000000 0.000000 0.000000 
2000-01-09 0.000000 0.000000 0.000000 0.000000 
2000-01-10 0.000000 0.000000 0.000000 0.000000 
2000-01-11 0.000000 0.000000 0.000000 0.000000 
2000-01-12 0.000000 0.000000 0.000000 0.000000 
2000-01-13 0.000000 0.000000 0.000000 0.000000 
2000-01-14 0.000000 0.000000 0.000000 0.000000 
2000-01-15 0.000000 0.000000 0.000000 0.000000 
2000-01-16 0.000000 0.000000 0.000000 0.000000 
2000-01-17 0.000000 0.000000 0.000000 0.000000 
2000-01-18 0.000000 0.000000 0.000000 0.000000 
2000-01-19 0.000000 0.000000 0.000000 0.000000 
2000-01-20 0.000000 0.000000 0.000000 0.000000 
2000-01-21 0.000000 0.000000 0.000000 0.000000 
2000-01-22 0.000000 0.000000 0.000000 0.000000 
2000-01-23 0.000000 0.000000 0.000000 0.000000 
2000-01-24 0.000000 0.000000 0.000000 0.000000 
2000-01-25 0.000000 0.000000 0.000000 0.000000 
2000-01-26 0.000000 0.000000 0.000000 0.000000 
2000-01-27 0.000000 0.000000 0.000000 0.000000 
2000-01-28 0.000000 0.000000 0.000000 0.000000 
2000-01-29 0.000000 0.000000 0.000000 0.000000 
2000-01-30 0.000000 0.000000 0.000000 0.000000 
2000-01-31 0.000000 0.000000 0.000000 0.000000 
2000-02-01 0.000000 0.000000 0.000000 0.000000 
2000-02-02 0.000000 0.000000 0.000000 0.000000 
2000-02-03 0.000000 0.000000 0.000000 0.000000 
2000-02-04 1.856158 2.892620 2.986166 2.793448 

這是因爲你的GROUPBY使用PeriodIndex,而不是日期時間:

df.groupby(pd.PeriodIndex(data=df.date, freq='D')) 

你可以有用來代替pd.Grouper

df.groupby(pd.Grouper(key="date", freq='D')) 

這將有一個日期時間索引。

2

cᴏʟᴅsᴘᴇᴇᴅ的提示在評論:


resample那麼這裏適合。

Resample:用於時間序列的頻率轉換和重採樣的便利方法。對象必須具有類似日期時間的索引(DatetimeIndex,PeriodIndex或TimedeltaIndex),或將類似日期時間的值傳遞給on或level關鍵字。

import random 
import datetime as dt 
import numpy as np 
import pandas as pd 

def generate_row(year, month, day): 
    while True: 
     date = dt.datetime(year=year, month=month, day=day) 
     data = np.random.random(size=4) 
     yield [date] + list(data) 

# days I have data for 
dates = [(2000, 1, 1), (2000, 1, 2), (2000, 2, 4)] 
generators = [generate_row(*date) for date in dates] 

# get 5 points for each 
data = [next(generator) for generator in generators for _ in range(5)] 

# make dataframe 
df = pd.DataFrame(data, columns=['date'] + ['f'+str(i) for i in range(1,5)]) 

# using the resample method 
df.set_index(df.date, inplace=True) 
df = df.resample('D').sum().fillna(0) 

enter image description here

+0

你有一些奇特的編輯技巧,我甚至不知道你可以鏈接到評論 – Alter

+1

謝謝......認爲它會更容易鏈接到評論比我的個人資料;-) –