2017-03-16 195 views
2

我有一個表像這樣:蟒蛇大熊貓:透視表

Name | ID | Contact_method | Contact 
sarah 1 house   h1 
sarah 1 mobile   m1 
sarah 1 email   [email protected] 
bob  2 house   h2 
bob  2 mobile   m2 
bob  2 email   [email protected] 
jones 3 house   h3 
jones 3 mobile   m3 
jones 3 email   [email protected] 
jones 4 house   h4 
jones 4 mobile   m4 
jones 4 email   [email protected] 

而且我希望它像這樣:

Name | ID | house | mobile | email 
sarah 1 h1  m1  [email protected] 
bob  2 h2  m2  [email protected] 
jones 3 h3  m3  [email protected] 
jones 4 h4  m4  [email protected] 

我已經可以做到這一點,但只有通過一種非常昂貴的pd.concat操作遍歷所有唯一的ID。有沒有簡單的方法來做到這一點?我也修改了pivot()transpose()。請注意,重複的名稱在那裏,以便我不能依靠列值的唯一性來執行join

回答

2

與所有列設置索引除了'Contact_method',然後unstack

df.set_index(
    ['Name', 'ID', 'Contact_method'] 
)['Contact'].unstack().rename_axis(None, 1).reset_index() 

    Name ID  email house mobile 
0 bob 2  [email protected] h2  m2 
1 jones 3 [email protected] h3  m3 
2 jones 4 [email protected] h4  m4 
3 sarah 1 [email protected] h1  m1 
+0

我有一張桌子坐在我的新的臨時地方,而我們繼續看爲房子。我現在開始遠程工作。我會每月兩次往返西雅圖。很快,我必須回去擺脫舊的地方所有的東西。仍然很忙,但我很享受有時間回答問題。我希望你做得好! @jezrael – piRSquared

+0

@jezrael是的,我做了一個傳奇的大推,然後我覺得我可以冷靜一點。你幾乎是有史以來最好的熊貓。我的下一個SO目標是通過DSM和Jeff的名單。我從來沒有被代表自己激勵過。我已經給了很多東西。我最終會得到100k ..我確實需要一件T恤。如果他們給你任何東西,你必須告訴我。 – piRSquared

+0

我想參與http://stats.stackexchange.com/和http://quant.stackexchange.com/。不過,我寧願選擇一些其他標籤來獲取黃金。就像我想要我的numpy徽章一樣,我忽略了機器學習的東西。我想在tensorflow中獲得一個金徽章(雖然我還有很多要學習) – piRSquared

0

一種方法是基於ID'手動'來建立(詞典)聯繫詞典。不知道它是否更有效。

people = dict() 
for index, row in pd.iterrows(): 
    ID = row['ID'] 
    if ID not in people: 
     people[ID] = {'ID': ID, 'Name': row['Name']} 
    people[ID][row['Contact_method']] = row['Contact'] 

print pandas.DataFrame(people).transpose() 

和輸出是:

ID Name  email house mobile 
1 1 sarah [email protected] h1  m1 
2 2 bob  [email protected] h2  m2 
3 3 jones [email protected] h3  m3 
4 4 jones [email protected] h4  m4 
0

或者你可以使用透視:

df1.set_index(['ID','Name']).pivot(columns='Contact_method').reset_index() 
0

我認爲piRSquared's solution是非常好的,但如果得到:

ValueError: Index contains duplicate entries, cannot reshape

print (df) 
    Name ID Contact_method  Contact 
0 sarah 1   house   h1 
1 sarah 1   mobile   m1 
2 sarah 1   email [email protected] 
3  bob 2   house   h2 
4  bob 2   mobile   m2 
5  bob 2   email  [email protected] 
6 jones 3   house   h3 
7 jones 3   mobile   m3 
8 jones 3   email [email protected] <-for same Name,ID and Contact_method get duplicate 
9 jones 3   email  [email protected] <-for same Name,ID and Contact_method get duplicate 
10 jones 4   house   h4 
11 jones 4   mobile   m4 
12 jones 4   email [email protected] 

使用pivot_tablegroubpy與聚集join

cols = ['Name','ID','house','mobile','email'] 
df1 = df.pivot_table(index=['ID','Name'], 
        columns='Contact_method', 
        values='Contact', 
        aggfunc=','.join) 
     .rename_axis(None, 1) 
     .reset_index() 
     .reindex_axis(cols, axis=1) 
print (df1) 
    Name ID house mobile    email 
0 sarah 1 h1  m1   [email protected] 
1 bob 2 h2  m2    [email protected] 
2 jones 3 h3  m3 [email protected],[email protected] <- join duplicates 
3 jones 4 h4  m4   [email protected] 

df1 = df.groupby(['Name', 'ID', 'Contact_method'])['Contact'] 
     .apply(','.join) 
     .unstack() 
     .rename_axis(None, 1) 
     .reset_index() 
     .reindex_axis(cols, axis=1) 
print (df1) 
    Name ID house mobile    email 
0 sarah 1 h1  m1   [email protected] 
1 bob 2 h2  m2    [email protected] 
2 jones 3 h3  m3 [email protected],[email protected] <- join duplicates 
3 jones 4 h4  m4   [email protected]