2017-09-14 111 views
3

我從一個「日期」一欄創建DatetimeIndex:pandas.DatetimeIndex頻率爲無,不能設置

sales.index = pd.DatetimeIndex(sales["date"]) 

現在指數如下所示:

DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06', 
        '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10', 
        '2003-01-11', '2003-01-13', 
        ... 
        '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25', 
        '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29', 
        '2016-07-30', '2016-07-31'], 
        dtype='datetime64[ns]', name='date', length=4393, freq=None) 

正如你看到的,freq屬性是無。我懷疑失敗的道路上的錯誤是由於缺少freq造成的。但是,如果我嘗試設置明確的頻率:

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-148-30857144de81> in <module>() 
     1 #### DEBUG 
----> 2 sales_train = disentangle(df_train) 
     3 sales_holdout = disentangle(df_holdout) 
     4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"]) 

<ipython-input-147-08b4c4ecdea3> in disentangle(df_train) 
     2  # transform sales table to disentangle sales time series 
     3  sales = df_train[["date", "store_id", "article_id", "amount_sold"]] 
----> 4  sales.index = pd.DatetimeIndex(sales["date"], freq="d") 
     5  sales = sales.pivot_table(index=["store_id", "article_id", "date"]) 
     6  return sales 

/usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 
    89     else: 
    90      kwargs[new_arg_name] = new_arg_value 
---> 91    return func(*args, **kwargs) 
    92   return wrapper 
    93  return _deprecate_kwarg 

/usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs) 
    399           'dates does not conform to passed ' 
    400           'frequency {1}' 
--> 401           .format(inferred, freq.freqstr)) 
    402 
    403   if freq_infer: 

ValueError: Inferred frequency None from passed dates does not conform to passed frequency D 

因此很明顯的頻率已經推斷出,但是既沒有存儲在DatetimeIndex的freq也不inferred_freq屬性 - 兩者都是無。有人可以澄清混亂嗎?

+0

does'sales.index = pd.DatetimeIndex(sales [「date」]。asfreq(freq ='D'))''? – EdChum

+0

編號「ValueError:長度不匹配:期望軸有218153個元素,新值有1個元素」 – clstaudt

+1

您的數據樣本本身沒有頻率。判斷你提供的信息,2003-01-05和2003-01-12是缺失的。 此外,2003-01-05 + 4393天使2015-01-12,而不是2016-07-31。 – 3kt

回答

2

它似乎與3kt音符的缺失日期有關。您可以像EdChum所建議的那樣「修復」asfreq('D'),但這會爲您提供缺少數據值的連續索引。它適用於我編造的一些樣本數據:

df=pd.DataFrame({ 'x':[1,2,4] }, 
    index=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06'])) 

df 
Out[756]: 
      x 
2003-01-02 1 
2003-01-03 2 
2003-01-06 4 

df.index 
Out[757]: DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], 
      dtype='datetime64[ns]', freq=None) 

請注意,freq=None。更一般

df.asfreq('D') 
Out[758]: 
       x 
2003-01-02 1.0 
2003-01-03 2.0 
2003-01-04 NaN 
2003-01-05 NaN 
2003-01-06 4.0 

df.asfreq('d').index 
Out[759]: 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-05', 
       '2003-01-06'], 
       dtype='datetime64[ns]', freq='D') 

,並根據正是你正在嘗試做的,你可能會想看看其他選項下面像重新索引&重採樣:如果您申請asfreq('D'),這會更改freq='D'Add missing dates to pandas dataframe

1

你有一對夫婦選擇這裏:

  • pd.infer_freq
  • pd.tseries.frequencies.to_offset

I suspect that errors down the road are caused by the missing freq.

你是絕對正確的。這是我經常使用的:

def add_freq(idx, freq=None): 
    """Add a frequency attribute to idx, through inference or directly. 

    Returns a copy. If `freq` is None, it is inferred. 
    """ 

    idx = idx.copy() 
    if freq is None: 
     if idx.freq is None: 
      freq = pd.infer_freq(idx) 
     else: 
      return idx 
    idx.freq = pd.tseries.frequencies.to_offset(freq) 
    if idx.freq is None: 
     raise AttributeError('no discernible frequency found to `idx`. Specify' 
          ' a frequency string with `freq`.') 
    return idx 

一個例子:

idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) # freq=None 

print(add_freq(idx)) # inferred 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B') 

print(add_freq(idx, freq='D')) # explicit 
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D') 

使用asfreq實際上將重新索引(補)失蹤日期,所以要小心的是,如果這不是你要找的內容。

The primary function for changing frequencies is the asfreq function. For a DatetimeIndex , this is basically just a thin, but convenient wrapper around reindex which generates a date_range and calls reindex .