2017-08-26 82 views
2

想知道是否有更有效的方式將多個列分隔成某個列。例如說我有:將某些列除以熊貓中的另一列

prev open close volume 
20.77 20.87 19.87 962816 
19.87 19.89 19.56 668076 
19.56 19.96 20.1 578987 
20.1 20.4 20.53 418597 

,我想獲得:「上一個」

prev open close volume 
20.77 1.0048 0.9567 962816 
19.87 1.0010 0.9844 668076 
19.56 1.0204 1.0276 578987 
20.1 1.0149 1.0214 418597 

基本上,列「打開」和「關閉」已經被從列中的值除以

我能夠

df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev'])) 
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev'])) 

我在想,如果有一個更簡單的方法來做到這一點?特別是如果有10列需要用相同的值來劃分呢?

+0

爲什麼不想到這一點....哈哈Ť漢克斯。我知道我做得比它應該更復雜 – user1179317

+0

'df.assign(open = df.open/df.prev,close = df.close/df.prev)'? – Abdou

回答

2
df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0) 

輸出:

prev  open  close volume 
0 20.77 1.004815 0.956668 962816 
1 19.87 1.001007 0.984399 668076 
2 19.56 1.020450 1.027607 578987 
3 20.10 1.014925 1.021393 418597 
3
columns_to_divide = ['open', 'close'] 
df[columns_to_divide] = df[columns_to_divide]/df['prev'] 
4

出於性能,我建議使用底層陣列數據和array-slicing爲兩列被修改進來序列使用視圖進去 -

a = df.values 
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None] 

爲了更詳細地討論陣列切片部分,a[:,[1,2]]將會有力那裏有一份副本,並會放慢速度。數據幀端的a[:,[1,2]]相當於df[['open','close']],而且我猜測它也在放慢速度。 df.iloc[:,1:3]因此改善了它。

採樣運行 -

In [64]: df 
Out[64]: 
    prev open close volume 
0 20.77 20.87 19.87 962816 
1 19.87 19.89 19.56 668076 
2 19.56 19.96 20.10 578987 
3 20.10 20.40 20.53 418597 

In [65]: a = df.values 
    ...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None] 
    ...: 

In [66]: df 
Out[66]: 
    prev  open  close volume 
0 20.77 1.004815 0.956668 962816 
1 19.87 1.001007 0.984399 668076 
2 19.56 1.020450 1.027607 578987 
3 20.10 1.014925 1.021393 418597 

運行測試

途徑 -

def numpy_app(df): # Proposed in this post 
    a = df.values 
    df.iloc[:,1:3] = a[:,1:3]/a[:,0,None] 
    return df 

def pandas_app1(df): # @Scott Boston's soln 
    df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0) 
    return df 

計時 -

In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float) 
    ...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume'))) 
    ...: df2 = df1.copy() 
    ...: 

In [45]: %timeit pandas_app1(df1) 
    ...: %timeit numpy_app(df2) 
    ...: 
100 loops, best of 3: 2.68 ms per loop 
1000 loops, best of 3: 885 µs per loop