2017-06-29 49 views
2

我有一個DataFrame有很多列,我想刪除行的列的值爲列爲空。我知道如何與一列做到這一點:刪除行,如果任何一組的值爲空

df = df[df['Column'] != ''] 

我想用一組列的做到這一點,像這樣:

df = df['' not in [df['Column1'], df['Column2'], df['Column3']]' 

然而,這給出了錯誤:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我該怎麼做?

回答

3

如果值是空字符串創建子集,每行的所有True的add allany

df = df[(df[['Column1', 'Column2', 'Column1']] != '').all(axis=1)] 

df = df[~(df[['Column1', 'Column2', 'Column1']] == '').any(axis=1)] 

如果值是NaN S,None小號使用dropna加參數subset

df = df.dropna(subset=['Column1', 'Column2', 'Column1']) 

樣品:

df = pd.DataFrame({'A':[np.nan,'','p','hh','f'], 
        'B':['',np.nan,'','','o'], 
        'C':['a','s','d','f','g'], 
        'D':['f','g','h','j','k'], 
        'E':['l','i',np.nan,'u','o'], 
        'F':['','','o','i',np.nan]}) 

print (df) 
    A B C D E F 
0 NaN  a f l  
1  NaN s g i  
2 p  d h NaN o 
3 hh  f j u i 
4 f o g k o NaN 

df1 = df.dropna(subset=['A', 'B', 'F']) 
print (df1) 
    A B C D E F 
2 p d h NaN o 
3 hh f j u i 

df2 = df[(df[['A', 'B', 'F']] != '').all(axis=1)] 
print (df2) 
    A B C D E F 
4 f o g k o NaN 

df2 = df[~(df[['A', 'B', 'F']] == '').any(axis=1)] 
print (df2) 
    A B C D E F 
4 f o g k o NaN 

編輯:

爲了比較字符串和一些列的數字得到:

TypeError: Could not compare [''] with block values

有它2個解決方案 - 比較創建numpy的陣列由values或轉換值到string s由astype

df = pd.DataFrame({'A':[np.nan,7,8,8,8], 
        'B':['',np.nan,'','','o'], 
        'C':['a','s','d','f','g'], 
        'D':['f','g','h','j','k'], 
        'E':['l','i',np.nan,'u','o'], 
        'F':['','','o','i',np.nan]}) 

print (df) 
    A B C D E F 
0 NaN  a f l  
1 7.0 NaN s g i  
2 8.0  d h NaN o 
3 8.0  f j u i 
4 8.0 o g k o NaN 

df2 = df[(df[['A', 'B', 'F']].values != '').all(axis=1)] 
print (df2) 
    A B C D E F 
4 8.0 o g k o NaN 

df2 = df[(df[['A', 'B', 'F']].astype(str) != '').all(axis=1)] 
print (df2) 
    A B C D E F 
4 8.0 o g k o NaN 
+0

我試過你的第一行,''df = df [(df [['Column1','Column2','Column1']]!='').all(axis = 1)]',TypeError:Could不會將['']與塊值進行比較' – Bluefire

+0

有問題,您有一些混合值,例如數字與字符串。簡單的解決方案是將數據幀轉換爲numpy數組,然後比較 - 'df = df [(df [['Column1','Column2','Column1']]。values!='').all(axis = 1)]' – jezrael

+0

我編輯答案,請檢查它。 – jezrael