爲什麼大熊貓的分類DataFrame會給出真值誤差？

我的數據包含一列「已婚」，其具有分類值是或否。我把它改爲數值類型：爲什麼大熊貓的分類DataFrame會給出真值誤差？

train['Married']=train['Married'].astype('category') 
train['Married'].cat.categories=[0,1]

現在我用下面的代碼填寫缺失值：

train['Married']=train['Married'].fillna(train['Married'].mode())

這是給錯誤：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

有人可以解釋爲什麼嗎？

來源

2017-07-27 ASHUTOSH CHANDRA

你能分手的計算來看看這個錯誤是由於'.mode（）'，'.fillna（）'還是'='歸因？ –

該錯誤指示您是在numpy的陣列或大熊貓系列使用這樣的邏輯運算符爲not, and, or從基地蟒：

例如：

s = pd.Series([1,1,2,2]) 
not pd.isnull(s.mode())

給出相同的錯誤：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

如果你看堆棧跟蹤，錯誤來自這一行：

fillna(self, value, method, limit) 
    1465   else: 
    1466 
-> 1467    if not isnull(value) and value not in self.categories: 
    1468     raise ValueError("fill value must be in categories") 
    1469

因此，它正在檢查您試圖填充的值是否在類別中;並且該行要求該值爲標量以便與not和and兼容;然而，series.mode()總是返回一個系列，它失敗這條線，嘗試從mode()獲取價值並填寫：

train['Married']=train['Married'].fillna(train['Married'].mode().iloc[0])

工作的示例：

s = pd.Series(["YES", "NO", "YES", "YES", None])  
s1 = s.astype('category') 
s1.cat.categories = [0, 1] 

s1 
#0 1.0 
#1 0.0 
#2 1.0 
#3 1.0 
#4 NaN 
#dtype: category 
#Categories (2, int64): [0, 1] 

s1.fillna(s1.mode().iloc[0]) 
#0 1 
#1 0 
#2 1 
#3 1 
#4 1 
#dtype: category 
#Categories (2, int64): [0, 1]

來源

2017-07-27 03:52:53 Psidom

爲什麼大熊貓的分類DataFrame會給出真值誤差？

回答

相關問題