2015-10-04 123 views
1

我有一個有兩列的熊貓數據框。例如:熊貓數據框,唯一化列

index  result 
LI00066994 0.740688 
LI00066994 0.742431 
LI00066994 0.741826 
LI00066994 0.741328 
LI00066994 0.741826 
LI00066994 0.741328 
LI00073078 0.741121 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 

現在我想有一個數據幀,在我的索引是唯一的,同時保持所有相應的結果 - 他們應該是在不同的列(結果1,結果2,result3 ...)。

所需的輸出:

index  result1 result2 result3 result4 result5 result6 
LI00066994 0.740688 0.742431 0.741826 0.741328 0.741826 0.741328 
LI00073078 0.741121 0.752619 0.757116 0.752619 0.757116 0.752619 

任何一個知道如何做到這一點?

回答

1

你可以做這樣的事情:

d = """index  result 
LI00066994 0.740688 
LI00066994 0.742431 
LI00066994 0.741826 
LI00066994 0.741328 
LI00066994 0.741826 
LI00066994 0.741328 
LI00073078 0.741121 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 
LI00073078 0.757116 
LI00073078 0.752619 
LI00073078 0.752620""" 

df = pd.read_csv(pd.core.common.StringIO(d), sep='\s+') 

df_out = pd.concat([pd.DataFrame({name: df_['result'].values}).T for name, df_ in df.groupby('index')]) 
df_out = df_out.rename(columns=lambda x: 'result' + str(x)) 
df_out = df_out.reset_index() 
print df_out 

產量:

 index result0 result1 result2 result3 result4 result5 result6 
0 LI00066994 0.741 0.742 0.742 0.741 0.742 0.741  NaN 
1 LI00073078 0.741 0.753 0.757 0.753 0.757 0.753 0.753 
0

不知道如何用熊貓做到這一點。但是,如果你很高興扔numpy的混進去給這一個鏡頭:

import numpy as np 
import pandas as pd 

index = [ 
    'LI00066994', 'LI00066994', 'LI00066994', 
    'LI00066994', 'LI00066994', 'LI00066994', 
    'LI00073078', 'LI00073078', 'LI00073078', 
    'LI00073078', 'LI00073078', 'LI00073078'] 
data = [ 
    0.740688, 0.742431, 0.741826, 0.741328, 
    0.741826, 0.741328, 0.741121, 0.752619, 
    0.757116, 0.752619, 0.757116, 0.752619] 
columns=['result'] 
df = pd.DataFrame(data=data, index=index, columns=columns) 

unique_index = np.unique(df.index) 
new_data = np.vstack([df.T[lookup] for lookup in unique_index]) 

new_df = pd.DataFrame(data=new_data, index=unique_index)