2017-03-06 52 views
2

我是一個Python和熊貓的新手。我需要做一些簡單的熊貓數據框解析來獲得一個新的數據框,涉及多個功能。這裏有一個玩具例子:熊貓應用多個自定義功能

df = pd.DataFrame({'A' : pd.Series(["T100", "T100", "M100", "M100"]), 'B' : pd.Series(["520", "620", "720", "820"]), 'C' : pd.Series(["10/50", "20/50", "30/50", "50/50"])}) 

>>> df 
     A  B  C 
0 T100  520 10/50 
1 T100  620 20/50 
2 M100  720 30/50 
3 M100  820 50/50 

這是我曾嘗試(自然也沒有工作 - 它返回的錯誤AttributeError: 'DataFrame' object has no attribute 'agg',但我想要做的想法是有):

def get_pat_ID(row): 
     sample = row['A'] 
     patID = re.match("[TM](\d+)", sample).group(1) 
     return(patID) 

def get_funcB(row): 
     sample, b, c = row['A'], row['B'], row['C'] 
     if sample == "T100": 
      output = b + "_" + c 
     else: 
      output = "NA" 
     return(output) 

    def cust(dataset, funcname): 
     f = dataset.apply(funcname, axis=1) # I want the function to be performed on each row of my dataframe 
     return(f) 

    funcdict = {"pat_ID": get_pat_ID, "funcB": get_funcB} # contains all the functions that I want to pass to my dataframe   
    funcs = {'PatID': cust(df, funcdict["pat_ID"]), 'AnotherFunc': cust(df, funcdict["funcB"])} # creates one column for output of each function 
    newdf = pd.DataFrame() 
    newdf = df.agg(funcs) 

我知道我的方法不是最有效的,因爲每次我計算函數時,apply函數都會重複使用相同的行。任何人都可以幫我嗎?

回答

0
>>> ndf = df.apply(lambda x: pd.Series(data=[get_pat_ID(x), get_funcB(x)], index=['pat_ID','get_funcB']), axis=1) 
>>> ndf 
    pat_ID get_funcB 
0 100 520_10/50 
1 100 620_20/50 
2 100   NA 
3 100   NA 
>>> pd.concat([df,ndf], axis=1) 
     A B  C pat_ID get_funcB 
0 T100 520 10/50 100 520_10/50 
1 T100 620 20/50 100 620_20/50 
2 M100 720 30/50 100   NA 
3 M100 820 50/50 100   NA 

甚至用簡單的循環:

>>> ndf = df.copy() 
>>> for k,v in funcdict.iteritems(): 
...  ndf[k] = ndf.apply(v, axis=1) 
... 
>>> ndf 
     A B  C  funcB pat_ID 
0 T100 520 10/50 520_10/50 100 
1 T100 620 20/50 620_20/50 100 
2 M100 720 30/50   NA 100 
3 M100 820 50/50   NA 100 
+0

對不起,我遲到的反應!感謝您的回答! – phusion