通過傳遞參數或理解列表來選擇數據幀python熊貓

我想選擇傳遞字典或理解列表的數據框中的行。通過傳遞參數或理解列表來選擇數據幀python熊貓

我有一個數百萬行的數據框，我想創建一個函數來只選擇這個數據框的一部分對應於一個參數列表。爲了複雜性，我必須傳遞數據框和列表，但是這個列表可以包含NaN值和'0'。所以我必須刪除這個條目來選擇正確的行。

的參賽名單：

b = ['MUSTANG', 'Coupé', '0', np.nan, np.nan] 

    AGE KM  Brand Model   Liter  Bodycar Power 
0 2.0 10000.0 FORD MUSTANG   5.0  Coupé 421 
1 2.0 10000.0 FORD MUSTANG   5.0  Coupé 421 
2 5.0 10400.0 FORD MUSTANG   5.0  Coupé 421 
3 5.0 10400.0 FORD MUSTANG   5.0  Coupé 421 
4 16.0 20700.0 FORD MUSTANG   3.7  Coupé 317 
5 7.0 23300.0 FORD MUSTANG   3.7     317 
6 7.0 23300.0 FORD MUSTANG   2.3  Coupé 301 
7 7.0 23300.0 FORD MUSTANG   5.0     421 
... 

I started a function to remove the part of the list useless and try to select the proper rows but failed... 

    def func_mcclbp_incomp(df, mcclbp): 
    ind = [] 

    mcclbp = [i if type(i) == str else '0' for i in mcclbp] 
    ind = [i for i, x in enumerate(mcclbp) if x=='0'] 

    head = ['Brand','Model','Bodycar','Liter', 'Power'] 
    mmcclbp = {head[0]:mcclbp[0], head[1]:mcclbp[1], head[2]:mcclbp[2], \ 
      head[3]:mcclbp[3], head[4]:mcclbp[4]} 
    for i in ind: 
     del mmcclbp[head[i]] 
    df = df[df[head[i]==mccblp[i]] for i in mmcclbp.key()] 
    return df

我嘗試了修真名單，但大熊貓給我一個錯誤：

File "<ipython-input-235-6f78e45f59d4>", line 1 
df = df[df[head[i].isin(mccblp[i]) for i in mmcclbp.keys()]] 
            ^
SyntaxError: invalid syntax

當我試圖傳遞一個字典我有一個KeyError異常。

如果我採用B所需的輸出是：

 AGE KM  Brand Model   Liter  Bodycar Power 
0 2.0 10000.0 FORD MUSTANG   5.0  Coupé 421 
1 2.0 10000.0 FORD MUSTANG   5.0  Coupé 421 
2 5.0 10400.0 FORD MUSTANG   5.0  Coupé 421 
3 5.0 10400.0 FORD MUSTANG   5.0  Coupé 421 
4 16.0 20700.0 FORD MUSTANG   3.7  Coupé 317 
6 7.0 23300.0 FORD MUSTANG   2.3  Coupé 301

如果我改變b鍵像另一個值：

b = ['FORD', 'MUSTANG', 'Coupé', '3.7', '317']

結果將是：

 AGE KM  Brand Model   Liter  Bodycar Power 
4 16.0 20700.0 FORD MUSTANG   3.7  Coupé 317

有人知道我如何可以自動選擇列出相應的行？

感謝您的回答，

Chris。

來源

2017-05-29 Chris PERE

才能添加所需的從'B = [「野馬」，'轎跑車的輸出， '0'，np.nan，np.nan]'和你的樣本數據？ – jezrael

是的，對不起，我忘了寫輸入...編輯顯示你我需要做什麼。 –

你能解釋更多的第一個輸出 - 爲什麼得到數據，如果有'0'或'nan's？ – jezrael

您可以使用dict進行篩選，使用DataFrame.all檢查所有True掩膜和篩選器的每個值的值爲boolean indexing。
還需要轉換的astype的DataFrame到string一切都值，因爲dict的所有values是string太：

d = {'Brand':'FORD', 'Model':'MUSTANG', 'Bodycar':'Coupé', 'Liter':'3.7', 'Power':'317'} 

print (df.astype(str)[list(d)] == pd.Series(d)) 
    Bodycar Brand Liter Model Power 
0  True True False True False 
1  True True False True False 
2  True True False True False 
3  True True False True False 
4  True True True True True 
6  True True False True False 

mask = (df.astype(str)[list(d)] == pd.Series(d)).all(axis=1) 
print (mask) 
0 False 
1 False 
2 False 
3 False 
4  True 
6 False 
dtype: bool 

df1 = df[mask] 
print (df1) 
    AGE  KM Brand Model Liter Bodycar Power 
4 16.0 20700.0 FORD MUSTANG 3.7 Coupé 317

來源

2017-05-29 11:47:11 jezrael

非常感謝這個答案！這工作得很好!!!!! –

再次感謝！你也是：） –

通過傳遞參數或理解列表來選擇數據幀python熊貓

回答

相關問題