2017-05-29 116 views
1

在熊貓,我有一個數據幀,其中每一行對應於一個用戶,並且每個列於與該用戶,包括他們如何額定某一件事的變量:樞軸在大熊貓一系列等級列

+----------------+--------------------------+----------+----------+ 
|  name  |   email   | rating_a | rating_b | 
+----------------+--------------------------+----------+----------+ 
| Someone  | [email protected]   |  7.8 |  9.9 | 
| Someone Else | [email protected] |  2.4 |  9.2 | 
| Another Person | [email protected] |  3.5 |  7.5 | 
+----------------+--------------------------+----------+----------+ 

欲樞轉表,使得一列是評級的(a,或b)的類型,另一種是額定的值(7.83.5等),和其他列是與上述相同,這樣:

+----------------+-------------------------+-------------+--------------+ 
|  name  |   email   | rating_type | rating_value | 
+----------------+-------------------------+-------------+--------------+ 
| Someone  | [email protected]  | a   |   7.8 | 
| Someone  | [email protected]  | b   |   9.9 | 
| Someone Else | [email protected] | a   |   2.4 | 
| Someone Else | [email protected] | b   |   9.2 | 
| Another Person | [email protected] | a   |   3.5 | 
| Another Person | [email protected] | b   |   7.5 | 
+----------------+-------------------------+-------------+--------------+ 

似乎熊貓melt方法是正確的,但我不完全確定我的id_vars是什麼和我的value_vars是在這種情況下。此外,它似乎刪除不在這兩個類別之一中的所有列,例如電子郵件地址。但我想保留所有這些信息。

我該如何與熊貓一起做這件事?

回答

2

您可以使用melt + str.replace變革列名:

df.columns = df.columns.str.replace('rating_','') 
df = df.melt(id_vars=['name','email'], var_name='rating_type', value_name='rating_value') 
print (df) 
      name      email rating_type rating_value 
0   Someone   [email protected]   a   7.8 
1 Someone Else  [email protected]   a   2.4 
2 Another Person [email protected]   a   3.5 
3   Someone   [email protected]   b   9.9 
4 Someone Else  [email protected]   b   9.2 
5 Another Person [email protected]   b   7.5 

set_index + stack + rename_axis + reset_index另一種解決方案:

df.columns = df.columns.str.replace('rating_','') 
df = df.set_index(['name','email']) 
     .stack() 
     .rename_axis(['name','email','rating_type']) 
     .reset_index(name='rating_value') 
print (df) 
      name      email rating_type rating_value 
0   Someone   [email protected]   a   7.8 
1   Someone   [email protected]   b   9.9 
2 Someone Else  [email protected]   a   2.4 
3 Someone Else  [email protected]   b   9.2 
4 Another Person [email protected]   a   3.5 
5 Another Person [email protected]   b   7.5 

解決方案與melt如果需要更改的行順序:

df.columns = df.columns.str.replace('rating_','') 
df = df.reset_index() \ 
     .melt(id_vars=['index','name','email'], 
      var_name='rating_type', 
      value_name='rating_value')\ 
     .sort_values(['index','rating_type']) \ 
     .drop('index', axis=1) \ 
     .reset_index(drop=True) 
print (df) 
      name      email rating_type rating_value 
0   Someone   [email protected]   a   7.8 
1   Someone   [email protected]   b   9.9 
2 Someone Else  [email protected]   a   2.4 
3 Someone Else  [email protected]   b   9.2 
4 Another Person [email protected]   a   3.5 
5 Another Person [email protected]   b   7.5