熊貓 - 如果數據幀的所有值都爲NaN

如何創建一個if語句執行以下操作：熊貓 - 如果數據幀的所有值都爲NaN

if all values in dataframe are nan: 
    do something 
else: 
    do something else

根據this post，可以檢查是否有數據框的所有值都爲NaN。我知道一個人不能做的：

if df.isnull().all(): 
    do something

它返回以下錯誤：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

來源

2017-04-19 Arthurim

需要另一個all，因爲第一all回報Series和另一scalar：

if df.isnull().all().all(): 
    do something

樣品：

df = pd.DataFrame(index=range(5), columns=list('abcde')) 
print (df) 
    a b c d e 
0 NaN NaN NaN NaN NaN 
1 NaN NaN NaN NaN NaN 
2 NaN NaN NaN NaN NaN 
3 NaN NaN NaN NaN NaN 
4 NaN NaN NaN NaN NaN 

print (df.isnull()) 
     a  b  c  d  e 
0 True True True True True 
1 True True True True True 
2 True True True True True 
3 True True True True True 
4 True True True True True 

print (df.isnull().all()) 
a True 
b True 
c True 
d True 
e True 
dtype: bool 

print (df.isnull().all().all()) 
True 

if df.isnull().all().all(): 
    print ('do something')

如果需要更快的解決方案 - numpy.isnan與numpy.all，而是首先通過values所有值轉換爲numpy array：

print (np.isnan(df.values).all()) 
True

時序：

df = pd.DataFrame(np.full((1000,1000), np.nan)) 
print (df) 

In [232]: %timeit (np.isnan(df.values).all()) 
1000 loops, best of 3: 1.23 ms per loop 

In [233]: %timeit (df.isnull().all().all()) 
100 loops, best of 3: 10 ms per loop 

In [234]: %timeit (df.isnull().values.all()) 
1000 loops, best of 3: 1.46 ms per loop

來源

2017-04-19 07:28:08 jezrael

上jezrael的更快的改善將是df.isnull().values.all()

In [156]: df.isnull().values.all() 
Out[156]: True

基準

小

In [149]: df.shape 
Out[149]: (5, 5) 

In [150]: %timeit df.isnull().values.all() 
10000 loops, best of 3: 112 µs per loop 

In [151]: %timeit df.isnull().all().all() 
1000 loops, best of 3: 271 µs per loop

大

In [153]: df.shape 
Out[153]: (1000, 1000) 

In [154]: %timeit df.isnull().values.all() 
10 loops, best of 3: 26.6 ms per loop 

In [155]: %timeit df.isnull().all().all() 
10 loops, best of 3: 40.8 ms per loop

來源

2017-08-10 18:42:44 Zero

熊貓 - 如果數據幀的所有值都爲NaN

回答

相關問題