你需要weekofyear
+ cumcount
對於指望他們對新列的名稱,然後通過set_index
與unstack
重塑:
。解決方案如果df
是DataFrame
和Time (HH:MM)
是列:
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
與pivot
另一種解決方案:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
如果需要通過0
附加參數fill_value=0
到unstack
取代的NaN:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1 2 3 4
2 5000 0 0 0
而在第二溶液中使用fillna
:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 0.0 0.0 0.0
。解決方案如果s
是Series
和Time (HH:MM)
是指數:
print (s)
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
Name: Value, dtype: int64
print (type(s))
<class 'pandas.core.series.Series'>
print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
而第二個解決方案:
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=s)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
。如果解決方案df
是DataFrame
和Time (HH:MM)
是指數:
print (df)
Value
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value'])
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
一個我遇到的問題是,DF其實並不是一個數據框(將編輯OP),但時間序列時間爲指標。所以第一部分'df ['Time(HH:MM)']'給了我一個KeyError – Adam
我認爲最簡單的是先調用df = df.reset_index(name ='Time(HH:MM)') – jezrael
爲什麼downvote ?我真的不明白... – jezrael