2017-05-05 68 views
3

使用熊貓,如何能繼DataFrame蟒,熊貓 - 遍歷鍵值列分成多個列

In [1]: import pandas as pd 
In [2]: pd.DataFrame({'month': [1, 1, 1, 2, 2, 3, 3], 
         'type': ["T1", "T1", "T4", "T2", "T3", "T1", "T3"], 
         'value': [10, 40, 20, 30, 10, 40, 50]}) 
Out[2]: 
    month type value 
0  1 T1  10 
1  1 T1  40 
2  1 T4  20 
3  2 T2  30 
4  2 T3  10 
5  3 T1  40 
6  3 T3  50 

進行處理以產生下面的結果?

Out[3]: 
    T1 T2 T3 T4 month 
0 10 0 0 0  1 
1 40 0 0 0  1 
2 0 0 0 20  1 
3 0 30 0 0  2 
4 0 0 10 0  2 
5 40 0 0 0  3 
6 0 0 50 0  3 

回答

4

pandas
巧妙地利用pd.get_dummies

pd.get_dummies(df.type).mul(df.value, 0).join(df.month) 

    T1 T2 T3 T4 month 
0 10 0 0 0  1 
1 40 0 0 0  1 
2 0 0 0 20  1 
3 0 30 0 0  2 
4 0 0 10 0  2 
5 40 0 0 0  3 
6 0 0 50 0  3 

numpy
或者同樣的想法,但超級充電

u, inv = np.unique(df.type.values, return_inverse=True) 
eye = np.eye(u.size, dtype=int) 
v = df.value.values 
m = df.month.values 
pd.DataFrame(
    np.column_stack([eye[inv] * v[:, None], m]), 
    df.index, np.append(u, 'month') 
) 

    T1 T2 T3 T4 month 
0 10 0 0 0  1 
1 40 0 0 0  1 
2 0 0 0 20  1 
3 0 30 0 0  2 
4 0 0 10 0  2 
5 40 0 0 0  3 
6 0 0 50 0  3 

定時

%timeit pd.get_dummies(df.type).mul(df.value, 0).join(df.month) 
1000 loops, best of 3: 1.1 ms per loop 

%%timeit 
u, inv = np.unique(df.type.values, return_inverse=True) 
eye = np.eye(u.size, dtype=int) 
v = df.value.values 
m = df.month.values 
pd.DataFrame(
    np.column_stack([eye[inv] * v[:, None], m]), 
    df.index, np.append(u, 'month') 
) 
10000 loops, best of 3: 189 µs per loop 

%%timeit 
(df.set_index(['type'],append=True)['value'] 
    .unstack(fill_value=0)).join(df[['month']]) 
100 loops, best of 3: 1.92 ms per loop 

%%timeit 
d1 = df.set_index(['month','type'], append=True)['value'] \ 
     .unstack(fill_value=0) \ 
     .reset_index(level=1) \ 

cols = d1.columns[1:].tolist() + d1.columns[:1].tolist() 
d1 = d1.reindex_axis(cols, axis=1) 
d1 
100 loops, best of 3: 2.48 ms per loop 
+0

在我看來很聰明。 – jezrael

+0

@jezrael謝謝你! – piRSquared

+0

@piRSquared,這真的很快! – MaxU

3

您可以使用組合的​​和unstack得到T1 - T4列,然後在這樣的月份列連接:

(df.set_index(['type'],append=True)['value'] 
    .unstack(fill_value=0)).join(df[['month']]) 
# T1 T2 T3 T4 month 
# 0 10 0 0 0  1 
# 1 40 0 0 0  1 
# 2 0 0 0 20  1 
# 3 0 30 0 0  2 
# 4 0 0 10 0  2 
# 5 40 0 0 0  3 
# 6 0 0 50 0  3 
2

您可以使用set_indexunstackreset_index。最後列的變化順序添加reindex_axis

df = df.set_index(['month','type'], append=True)['value'] 
     .unstack(fill_value=0) 
     .reset_index(level=1) 
#reorder columns 
cols = df.columns[1:].tolist() + df.columns[:1].tolist() 
df = df.reindex_axis(cols, axis=1) 
print (df) 
type T1 T2 T3 T4 month 
0  10 0 0 0  1 
1  40 0 0 0  1 
2  0 0 0 20  1 
3  0 30 0 0  2 
4  0 0 10 0  2 
5  40 0 0 0  3 
6  0 0 50 0  3