2017-10-17 103 views
1

假設我們有按分鐘指數時間序列如下:在Python創建從時間序列截面數據幀

DF =

Time (HH:MM)  Value 
01/01/2014 00:00 1 
01/01/2014 00:01 2 
01/01/2014 00:02 3 
01/01/2014 00:03 4 
... 
01/08/2014 00:00 5000 
... 

我期待「組」數據集通過周,如下:

DF2 =

Week Val1 Val2 Val3 Val4 ... 
1  1 2 3 4 ... 
2  5000 ... 
3 
4 
... 

換言之,每1分鐘在周觀察1(01/01/2014-01/08/2014)在df2中表示爲一列。 (每週應該有10,080分鐘/列)。

我已經嘗試了一些函數,包括groupby(),但是他們中的大多數似乎彙總數據,而不是將其拆分爲我正在尋找的各個列。

編輯:它不一定必須是數據框格式,但我使用這個函數輸入數週。類似於嘗試創建每週的值的直方圖

回答

1

你需要weekofyear + cumcount對於指望他們對新列的名稱,然後通過set_indexunstack重塑:

。解決方案如果dfDataFrameTime (HH:MM)是列:

print (type(df)) 
<class 'pandas.core.frame.DataFrame'> 

print (df.columns) 
Index(['Time (HH:MM)', 'Value'], dtype='object') 

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week') 
countweeks = df.groupby(weeks).cumcount() + 1 
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val') 
print (df) 
     Val1 Val2 Val3 Val4 
Week       
1  1.0 2.0 3.0 4.0 
2  5000.0 NaN NaN NaN 

pivot另一種解決方案:

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week') 
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val') 
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi 
print (df) 
     Val1 Val2 Val3 Val4 
Week       
1  1.0 2.0 3.0 4.0 
2  5000.0 NaN NaN NaN 

如果需要通過0附加參數fill_value=0unstack取代的NaN:

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week') 
countweeks = df.groupby(weeks).cumcount() + 1 
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val') 
print (df) 
     Val1 Val2 Val3 Val4 
Week       
1  1  2  3  4 
2  5000  0  0  0 

而在第二溶液中使用fillna

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week') 
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val') 
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0) 
print (df) 
     Val1 Val2 Val3 Val4 
Week       
1  1.0 2.0 3.0 4.0 
2  5000.0 0.0 0.0 0.0 

。解決方案如果sSeriesTime (HH:MM)是指數:

print (s) 

Time (HH:MM) 
01/01/2014 00:00  1 
01/01/2014 00:01  2 
01/01/2014 00:02  3 
01/01/2014 00:03  4 
01/08/2014 00:00 5000 
Name: Value, dtype: int64 

print (type(s)) 
<class 'pandas.core.series.Series'> 

print (s.index) 
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02', 
     '01/01/2014 00:03', '01/08/2014 00:00'], 
     dtype='object', name='Time (HH:MM)') 

weeks = pd.to_datetime(s.index).weekofyear.rename('Week') 
countweeks = s.groupby(weeks).cumcount() + 1 
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val') 
print (df) 
     Val1 Val2 Val3 Val4 
Week       
1  1.0 2.0 3.0 4.0 
2  5000.0 NaN NaN NaN 

而第二個解決方案:

weeks = pd.to_datetime(s.index).weekofyear.rename('Week') 
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val') 
df = pd.pivot(index=weeks, columns=countweeks, values=s) 
print (df) 
     Val1 Val2 Val3 Val4 
Week       
1  1.0 2.0 3.0 4.0 
2  5000.0 NaN NaN NaN 

。如果解決方案dfDataFrameTime (HH:MM)是指數:

print (df) 
        Value 
Time (HH:MM)   
01/01/2014 00:00  1 
01/01/2014 00:01  2 
01/01/2014 00:02  3 
01/01/2014 00:03  4 
01/08/2014 00:00 5000 

print (type(df)) 
<class 'pandas.core.frame.DataFrame'> 

print (df.index) 
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02', 
     '01/01/2014 00:03', '01/08/2014 00:00'], 
     dtype='object', name='Time (HH:MM)') 

weeks = pd.to_datetime(df.index).weekofyear.rename('Week') 
countweeks = df.groupby(weeks).cumcount() + 1 
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val') 

weeks = pd.to_datetime(df.index).weekofyear.rename('Week') 
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val') 
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']) 
print (df) 

     Val1 Val2 Val3 Val4 
Week       
1  1.0 2.0 3.0 4.0 
2  5000.0 NaN NaN NaN 
+0

一個我遇到的問題是,DF其實並不是一個數據框(將編輯OP),但時間序列時間爲指標。所以第一部分'df ['Time(HH:MM)']'給了我一個KeyError – Adam

+0

我認爲最簡單的是先調用df = df.reset_index(name ='Time(HH:MM)') – jezrael

+1

爲什麼downvote ?我真的不明白... – jezrael

0

你可以使用pivot_table

In [3192]: df['Week'] = df['Time (HH:MM)'].dt.weekofyear 

In [3193]: df['ValCount'] = 'Val' + df.groupby('Week').cumcount().add(1).astype(str) 

In [3194]: df.pivot_table(index='Week', columns='ValCount', values='Value').reset_index() 
Out[3194]: 
ValCount Week Val1 Val2 Val3 Val4 
0   1  1.0 2.0 3.0 4.0 
1   2 5000.0 NaN NaN NaN 

要有Week在指數

In [3198]: df.pivot_table(index='Week', columns='ValCount', 
          values='Value').rename_axis(None, 1) 
Out[3198]: 
     Val1 Val2 Val3 Val4 
Week 
1  1.0 2.0 3.0 4.0 
2  5000.0 NaN NaN NaN 

詳細

In [3202]: df 
Out[3202]: 
     Time (HH:MM) Value 
0 2014-01-01 00:00:00  1 
1 2014-01-01 00:01:00  2 
2 2014-01-01 00:02:00  3 
3 2014-01-01 00:03:00  4 
4 2014-01-08 00:00:00 5000 

In [3203]: df.dtypes 
Out[3203]: 
Time (HH:MM) datetime64[ns] 
Value     int64 
dtype: object 
+0

爲什麼pivot_table?它彙總了價值 - 這裏沒有必要 – jezrael

+2

爲什麼downvote? – Zero