2017-07-03 131 views
1

我的問題類似於(Python pandas: how to run multiple univariate regression by group)。我有一組迴歸運行組,但在我的情況下回歸係數是有界的0和1之間,有一個約束,迴歸係數的總和應該= 1. 我試圖解決它作爲一個優化問題;首先使用整個數據框(不考慮組)。Python scipy.optimize:如何按組運行多個單變量約束迴歸

import pandas as pd 
import numpy as np 

df = pd.DataFrame({ 
    'y0': np.random.randn(20), 
    'y1': np.random.randn(20), 
    'x0': np.random.randn(20), 
    'x1': np.random.randn(20), 
    'grpVar': ['a', 'b'] * 10}) 

def SumSqDif(a): 
    return np.sum((df['y0'] - a[0]*df['x0'])**2 + (df['y1'] - a[1]*df['x1'])**2) 

# Starting values 
startVal = np.ones(2)*(1/2) 

#Constraint Sum of coefficients = 0 
cons = ({'type':'eq', 'fun': lambda x: 1 - sum(x)}) 

# Bounds on coefficients 
bnds = tuple([0,1] for x in startVal) 

# Solve the optimization problem using the full dataframe (disregarding groups) 
from scipy.optimize import minimize 
Result = minimize(SumSqDif, startVal , method='SLSQP' , bounds=bnds , constraints = cons) 
Result.x 

然後我試圖通過和apply()使用數據幀group。但是我得到的錯誤是

TypeError: unhashable type: 'numpy.ndarray'.

# Try to Solve the optimization problem By group 
# Create GroupBy object 
grp_grpVar = df.groupby('grpVar') 

def RunMinimize(data): 
    ResultByGrp = minimize(SumSqDif, startVal , method='SLSQP' , bounds=bnds , constraints = cons) 
    return ResultByGrp.x 

grp_grpVar.apply(RunMinimize(df)) 

這也許可以通過迭代循環來完成,但我的實際數據包含7000萬組,我認爲該數據幀GROUP BY和apply()會更有效。 我是Python新手。我搜索了這個和其他網站,但找不到數據幀apply()scipy.optimize.minimize的任何示例。 任何想法將不勝感激?

回答

0

我相信你想要的東西是這樣的:

# add df parameter to your `SumSqDif` function signature, so that when you apply 
# this function to your grouped by dataframe, the groups gets passed 
# as the df argument to this function 
def SumSqDif(a, df): 
    return np.sum((df['y0'] - a[0]*df['x0'])**2 + (df['y1'] - a[1]*df['x1'])**2) 

# add startVal, bnds, and cons as additional parameters 
# The way you wrote your function signature is that it 
# uses these values from the global namespace, which is not good practice, 
# because you're assuming these values exist in the global scope, 
# which may not always be true 
def RunMinimize(data, startVal, bnds, cons): 
    # add additional argument of data into the minimize function 
    # this passes the group as the df to SumSqDif 
    ResultByGrp = minimize(SumSqDif, startVal, method='SLSQP', 
          bounds=bnds, constraints = cons, args=(data)) 
    return ResultByGrp.x 

# Here, you're passing the startVal, bnds, and cons are arguments as 
# additional keyword arguments to `apply` 
df.groupby('grpVar').apply(RunMinimize, startVal=startVal, bnds=bnds, cons=cons)) 
+0

大。正是我需要的。謝謝你,Scratch'N'Purr。 – Paul

+0

沒問題!你介意提出我的問題嗎? :D –

+0

我嘗試過,但我不能。有消息記錄了名聲低於15的人的投票記錄,但不會更改公開顯示的分數。抱歉。 – Paul