2016-04-22 103 views
-1

我有兩個數據幀:兩個數據幀都有5列,但第一列有100行,第二列有一列。我應該將第一個數據幀的每一行乘以第二行的這一行,然後總結每一行中列的值,並在第6個新列的「乘法和」中總結這個值。我已經看到「np.dot」操作,但我不確定我是否可以將它應用到數據框中。另外,我正在尋找pythonic/pandas操作或方法,如果可以從頭開始替換一點點粗糙的代碼,請提前感謝。建議乘以不同長度的數據幀

+4

給予的代碼和數據的例子有助於我們回答更快。 – tfv

回答

1

我想你可以通過values,他們多次和最後sum轉換DataFramesnumpy arrays

import pandas as pd 
import numpy as np 

np.random.seed(1) 
df1 = pd.DataFrame(np.random.randint(10, size=(1,5))) 
df1.columns = list('ABCDE') 
print df1 
    A B C D E 
0 5 8 9 5 0 

np.random.seed(0) 
df2 = pd.DataFrame(np.random.randint(10,size=(10,5))) 
df2.columns = list('ABCDE') 
print df2 
    A B C D E 
0 5 0 3 3 7 
1 9 3 5 2 4 
2 7 6 8 8 1 
3 6 7 7 8 1 
4 5 9 8 9 4 
5 3 0 3 5 0 
6 2 3 8 1 3 
7 3 3 7 0 1 
8 9 9 0 4 7 
9 3 2 7 2 0 
print df2.values * df1.values 
[[25 0 27 15 0] 
[45 24 45 10 0] 
[35 48 72 40 0] 
[30 56 63 40 0] 
[25 72 72 45 0] 
[15 0 27 25 0] 
[10 24 72 5 0] 
[15 24 63 0 0] 
[45 72 0 20 0] 
[15 16 63 10 0]] 

df = pd.DataFrame(df2.values * df1.values) 
df['sum'] = df.sum(axis=1) 
print df 
    0 1 2 3 4 sum 
0 25 0 27 15 0 67 
1 45 24 45 10 0 124 
2 35 48 72 40 0 195 
3 30 56 63 40 0 189 
4 25 72 72 45 0 214 
5 15 0 27 25 0 67 
6 10 24 72 5 0 111 
7 15 24 63 0 0 102 
8 45 72 0 20 0 137 
9 15 16 63 10 0 104 

定時

In [1185]: %timeit df2.mul(df1.ix[0], axis=1) 
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 287 µs per loop 

In [1186]: %timeit pd.DataFrame(df2.values * df1.values) 
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 98 µs per loop 
0

您可能正在尋找這樣的事情:

import pandas as pd 
import numpy as np 

df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4], 
        'B' : [-1.,-2.5, -3.9]}) 

df1['sum of multipliations']=df1.sum(axis = 1) 


df2 = pd.DataFrame({ 'A' : [2.], 
        'B' : [3.], 
        'sum of multipliations' : [1.]}) 

print df1 
print df2 

row = df2.ix[0] 
df5=df1.mul(row, axis=1) 
df5.loc['Total']= df5.sum() 
print df5 
相關問題