2014-11-03 189 views
2

我有一個數據幀,看起來類似於以下壓扁大熊貓據幀

import pandas as pd 
df = pd.DataFrame(data={"CommonIdentifier":[1234,1235,1236,1237], 
         "CommonValue":["type1","type2","type1","type1"], 
         "Identifier1":["a","a","b","b"], 
         "Value1":[1,1.5,2,1.3], 
         "Identifier2":["b","b","c","a"], 
         "Value2":[4,2,3.2,1]}) 
df 

CommonIdentifier CommonValue Identifier1 Identifier2 Value1 Value2 
0   1234  type1   a   b 1.0 4.0 
1   1235  type2   a   b 1.5 2.0 
2   1236  type1   b   c 2.0 3.2 
3   1237  type1   b   a 1.3 1.0 

,我想操縱它,所以它的形式爲

pd.DataFrame(data={"CommonIdentifier":[1234,1235,1236,1237,1234,1235,1236,1237], 
        "CommonValue":["type1","type2","type1","type1","type1","type2","type1","type1"], 
        "IdentifierJoined":["a","a","b","b","b","b","c","a"], 
        "ValueJoined":[1,1.5,2,1.3,4,2,3.2,1]}) 

    CommonIdentifier CommonValue IdentifierJoined ValueJoined 
0    1234   type1    a   1.0 
1    1235   type2    a   1.5 
2    1236   type1    b   2.0 
3    1237   type1    b   1.3 
4    1234   type1    b   4.0 
5    1235   type2    b   2.0 
6    1236   type1    c   3.2 
7    1237   type1    a   1.0 

以上僅僅是代表,在實際我有9個標識符/值對。我想知道如何以簡潔的方式完成此任務。我想出了以下

num_identifiers = 2 
m1 = pd.melt(df,id_vars=['CommonIdentifier'], 
      value_vars=['Value1','Value2']) 
m2 = pd.melt(df,id_vars=['CommonIdentifier'], 
      value_vars=['Identifier1','Identifier2']) 
m3 = pd.concat([df[["CommonIdentifier","CommonValue"]]] * num_identifiers) 
m3.index = range(len(m3)) 

pd.concat([m3[["CommonIdentifier","CommonValue"]], 
      m2[["value"]].rename(columns={"value":"IdentifierJoined"}), 
      m1[["value"]].rename(columns={"value":"ValueJoined"})], axis=1) 

其實現這種轉型卻感覺相當草率/冗長

回答

2

你可以使用df[[...]]選擇所需的列, rename(columns=...)改變列名,然後 pd.concat堆疊子DataFrames:

import pandas as pd 

df = pd.DataFrame(data={"CommonIdentifier":[1234,1235,1236,1237], 
         "CommonValue":["type1","type2","type1","type1"], 
         "Identifier1":["a","a","b","b"], 
         "Value1":[1,1.5,2,1.3], 
         "Identifier2":["b","b","c","a"], 
         "Value2":[4,2,3.2,1]}) 

colgroups = [['CommonIdentifier', 'CommonValue', 'Identifier%d'%i, 'Value%d'%i] 
     for i in range(1,3)] 

colmap = {'{}{}'.format(col,i):'{}Joined'.format(col) 
      for i in range(1,3) for col in ('Identifier', 'Value')} 

result = pd.concat([df[cols].rename(columns=colmap) for cols in colgroups]) 
print(result) 

產生

CommonIdentifier CommonValue IdentifierJoined ValueJoined 
0    1234  type1    a   1 
1    1235  type2    a   1.5 
2    1236  type1    b   2 
3    1237  type1    b   1.3 
4    1234  type1    b   4 
5    1235  type2    b   2 
6    1236  type1    c   3.2 
7    1237  type1    a   1 

另一種選擇,通過HYRY的解決方案的啓發,將隱藏在指數共同列,然後應用HYRY的stack ING招:

import pandas as pd 
df = pd.DataFrame(data={"CommonIdentifier":[1234,1235,1236,1237], 
         "CommonValue":["type1","type2","type1","type1"], 
         "Identifier1":["a","a","b","b"], 
         "Value1":[1,1.5,2,1.3], 
         "Identifier2":["b","b","c","a"], 
         "Value2":[4,2,3.2,1]}) 

df = df.set_index(['CommonIdentifier', 'CommonValue']) 
df.columns = pd.MultiIndex.from_product([["Identifier", "Value"], range(2)]) 
df = df.stack() 
df.index = df.index.droplevel(-1) 
print(df.reset_index()) 
1

這裏有一個方法使用MultiIndexstack()merge()

import pandas as pd 
df = pd.DataFrame(data={"CommonIdentifier":[1234,1235,1236,1237], 
         "CommonValue":["type1","type2","type1","type1"], 
         "Identifier1":["a","a","b","b"], 
         "Value1":[1,1.5,2,1.3], 
         "Identifier2":["b","b","c","a"], 
         "Value2":[4,2,3.2,1]}) 

common_columns = ["CommonIdentifier", "CommonValue"] 
identifier_columns = ["Identifier1", "Identifier2"] 
value_columns = ["Value1", "Value2"] 

common_df = df[common_columns] 
mix_df = df[identifier_columns + value_columns] 
mix_df.columns = pd.MultiIndex.from_product([["Identifier", "Value"], range(len(identifier_columns))]) 
pd.merge(common_df, mix_df.stack().reset_index(level=1, drop=True), 
     left_index=True, right_index=True).reset_index(drop=True)