2017-09-15 103 views
2

我有三列的熊貓數據幀選擇項目,如部分如下所示:重新排序在兩隻大熊貓數據幀列

data = {'T1': {0: 'Belarus', 1: 'Netherlands', 2: 'France', 3: 'Faroe Islands', 
     4: 'Hungary'}, 'T2': {0: 'Sweden', 1: 'Bulgaria', 2: 'Luxembourg', 
     3: 'Andorra', 4: 'Portugal'}, 'score': {0: -4, 1: 2, 2: 0, 3: 1, 4: -1}} 
df = pd.DataFrame(data) 
#   T1    t2 score 
#0  Belarus  Sweden  -4 
#1 Netherlands Bulgaria  2 
#2   France Luxembourg  0 
#3 Faroe Islands  Andorra  1 
#4  Hungary Portugal  -1 

對於任何行,其中物品T1T2不是字母順序(例如,"Netherlands""Bulgaria"),我想交換項目,並且還要更改score的符號。

我能想出一個怪物:

df.apply(lambda x: 
      pd.Series([x["T2"], x["T1"], -x["score"]]) 
      if (x["T1"] > x["T2"]) 
      else pd.Series([x["T1"], x["T2"], x["score"]]), 
     axis=1) 
#   0    1 2 
#0 Belarus   Sweden -4 
#1 Bulgaria Netherlands -2 
#2 France  Luxembourg 0 
#3 Andorra Faroe Islands -1 
#4 Hungary  Portugal -1 

是否有更好的方法來得到相同的結果? (性能不成問題)

回答

3

選項1
布爾索引。

m = df.T1 > df.T2 
m 

0 False 
1  True 
2 False 
3  True 
4 False 
dtype: bool 

df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1) 
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values 
df 

     T1    T2 score 
0 Belarus   Sweden  -4 
1 Bulgaria Netherlands  -2 
2 France  Luxembourg  0 
3 Andorra Faroe Islands  -1 
4 Hungary  Portugal  -1 

選項2
df.eval

m = df.eval('T1 > T2') 
df.loc[m, 'score'] = df.loc[m, 'score'].mul(-1) 
df.loc[m, ['T1', 'T2']] = df.loc[m, ['T2', 'T1']].values 
df 

     T1    T2 score 
0 Belarus   Sweden  -4 
1 Bulgaria Netherlands  -2 
2 France  Luxembourg  0 
3 Andorra Faroe Islands  -1 
4 Hungary  Portugal  -1 

選項3
df.query

idx = df.query('T1 > T2').index 
idx 
Int64Index([1, 3], dtype='int64') 

df.loc[idx, 'score'] = df.loc[idx, 'score'].mul(-1) 
df.loc[idx, ['T1', 'T2']] = df.loc[idx, ['T2', 'T1']].values 
df 

     T1    T2 score 
0 Belarus   Sweden  -4 
1 Bulgaria Netherlands  -2 
2 France  Luxembourg  0 
3 Andorra Faroe Islands  -1 
4 Hungary  Portugal  -1 
4

還不如利落的@cᴏʟᴅsᴘᴇᴇᴅ的答案,但工作

df1=df[['T1','T2']] 
df1.values.sort(1) 
df1['new']=np.where((df1!=df[['T1','T2']]).any(1),-df.score,df.score) 

df1 
Out[102]: 
     T1    T2 new 
0 Belarus   Sweden -4 
1 Bulgaria Netherlands -2 
2 France  Luxembourg 0 
3 Andorra Faroe Islands -1 
4 Hungary  Portugal -1 
+0

你需要打印出df1 :) –

+0

@cᴏʟᴅsᴘᴇᴇᴅ是的,你是對的〜:) – Wen

2

使用LOC

cond = df.T1 > df.T2 
df.loc[cond, 'score'] = df['score'] *-1 
df.loc[cond, ['T1', 'T2']] = df.loc[cond, ['T2', 'T1']].values 

輸出

T1   T2    score 
0 Belarus  Sweden   -4 
1 Bulgaria Netherlands  -2 
2 France  Luxembourg  0 
3 Andorra  Faroe Islands -1 
4 Hungary  Portugal  -1 
+0

Loc已經在這裏提到:https://stackoverflow.com/a/46231172/4909087 –

+0

但是......感謝這個我意識到我也需要交換這些值,所以沒關係;-) –

3

這裏是一個有趣的和創造性的方式使用numpy的工具

t = df[['T1', 'T2']].values 
a = t.argsort(1) 

df[['T1', 'T2']] = t[np.arange(len(t))[:, None], a] 
# @ is python 3.5 thx @cᴏʟᴅsᴘᴇᴇᴅ 
# otherwise use 
# df['score'] *= a.dot([-1, 1]) 
df['score'] *= a @ [-1, 1] 

df 

     T1    T2 score 
0 Belarus   Sweden  -4 
1 Bulgaria Netherlands  -2 
2 France  Luxembourg  0 
3 Andorra Faroe Islands  -1 
4 Hungary  Portugal  -1 
+0

'@'?這是什麼語法? –

+0

Python 3數組乘法...應該說( - : – piRSquared

+0

你的意思是3.6?這在任何python <= 3.4上拋出一個語法 –