2017-03-09 30 views
1

我有以下如何獲得假人和GROUPBY

Q A 
A a h 
A b i 
A c j 
B d k 
B a l 
B b m 
C c n 

數據幀我想獲得虛擬和GROUPBY

a b c d e f g 
A h i j nan nan nan nan 
B l nan nan nan k nan nan  
C nan nan n nan nan nan nan 

col=df.Q

我必須申請get_dummiesgroupby。但我想不出弄清楚。

我該如何得到這個結果?

回答

2

看來你需要reset_indexpivot

df = df.reset_index().pivot(index='index', columns='Q', values='A') 
print (df) 
Q   a  b  c  d 
index       
A   h  i  j None 
B   l  m None  k 
C  None None  n None 

然後,如果neccessary reindex_axisreplace

cols = list('abcdefg') 
print (df.reindex_axis(cols, axis=1).replace({None:np.nan})) 
Q  a b c d e f g 
index         
A  h i j NaN NaN NaN NaN 
B  l m NaN k NaN NaN NaN 
C  NaN NaN n NaN NaN NaN NaN 

編輯:

如果數據副本更好的是groupbyjoin

print (df) 
    Q A 
A a h 
A b i 
A c j 
B d k 
B a l 
B b m <-duplicates B b 
B b t <-duplicates B b 
C c n 


df = df.reset_index().groupby(['index','Q'])['A'].apply(','.join).unstack() 
print (df) 
Q   a  b  c  d 
index       
A   h  i  j None 
B   l m,t None  k 
C  None None  n None 

pivot_table另一種可能的解決方案:

#aggfunc='first' - get only first value, another values are lost 
df1 = df.reset_index().pivot_table(index='index', columns='Q', values='A', aggfunc='first') 
print (df1) 
Q   a  b  c  d 
index       
A   h  i  j None 
B   l  m None  k 
C  None None  n None 
Q   a  b  c  d 

#aggfunc='sum' - summed data, no separator 
df2 = df.reset_index().pivot_table(index='index', columns='Q', values='A', aggfunc='sum') 
print (df2) 
index       
A   h  i  j None 
B   l mt None  k 
C  None None  n None 
Q   a  b  c  d 

#aggfunc=','.join - summed data with separator 
df3 = df.reset_index().pivot_table(index='index', columns='Q', values='A', aggfunc=','.join) 
print (df3) 
index       
A   h  i  j None 
B   l m,t None  k 
C  None None  n None