2017-04-20 56 views
2

建立熊貓:下一個報頭

我有一個熊貓數據幀df選自出多個列的,具有類似的報頭,

| id | x, single room | x, double room | y, single room | y, double room | 
-------------------------------------------------------------------------- 
    ⋮   ⋮    ⋮     ⋮     ⋮ 


問題

組多個列

我想將從x開始的列分組,並以開始以下列方式在標題下,

 |    x    |    y    | 
-------------------------------------------------------------- 
| id | single room | double room | single room | double room | 
-------------------------------------------------------------- 
    ⋮  ⋮    ⋮    ⋮    ⋮   

我該怎麼辦?

+0

這是可以做到(或多或少)與[MultiIndexing](http://pandas.pydata.org/pandas-docs/stable/advanced.html)。 – languitar

回答

1

您可以使用split,但主要的問題是讓id去年水平:

col =['id','x, single room','x, double room','y, single room','y, double room' ] 
df = pd.DataFrame([[1,1,1,1,1]], columns=col) 
print (df) 
    id x, single room x, double room y, single room y, double room 
0 1    1    1    1    1 

#create tuples from MultiIndex 
a = df.columns.str.split(', ', expand=True).values 
print (a) 
[('id', nan) ('x', 'single room') ('x', 'double room') ('y', 'single room') 
('y', 'double room')] 

#swap values in NaN and replace NAN to '' 
df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a]) 
print (df) 
       x      y    
    id single room double room single room double room 
0 1   1   1   1   1 

舊的解決方案:

a = pd.DataFrame(df.columns.str.rsplit(', ', expand=True).values.tolist()) 
mask = a[1].isnull() 
a.loc[mask, [0,1]] = a.loc[mask, [1,0]].values 
a[0] = a[0].fillna('') 
df.columns = a.set_index([0,1]).index 
df.columns.names = ('', '') 
+0

太棒了!謝謝。 – LucSpan

+0

我有更好的解決方案,給我一秒鐘。 – jezrael