從R-3.4.0開始,此錯誤已得到修復。以下答案現在僅作爲歷史參考。
正如我在我的評論說,此刻exclude
只適用於
factor(as.character(foo), exclude = "a")
而不是
factor(foo, exclude = "a")
注意,文檔?factor
下[R 3.3.1未滿足全部:
exclude: a vector of values to be excluded when forming the set of
levels. This should be of the same type as ‘x’, and will be
coerced if necessary.
下列情況不給予任何警告或錯誤,但還沒有做任何事情:
## foo is a factor with `typeof` being "integer"
factor(foo, exclude = 1L)
factor(foo, exclude = factor("a", levels = levels(foo)))
#[1] a b a c a a c c
#Levels: a b c
實際上,文件似乎相當矛盾的,因爲它也記載:
The encoding of the vector happens as follows. First all the
values in ‘exclude’ are removed from ‘levels’.
所以它看起來像開發者真的期望exclude
是一個「角色」。
這更可能是factor
內部的一個bug。問題是相當明顯的,即以下內部factor(x, ...)
線做混亂時輸入向量x
是「因子」類:
exclude <- as.vector(exclude, typeof(x))
如在這種情況下是typeof(x)
「整數」。如果exclude
是一個字符串,則在嘗試將字符串轉換爲整數時會生成NA
。
我真的不知道爲什麼在factor
裏面有這樣一行。隨後的兩行只是在做正確的事情,如果該行不存在:
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
所以,補救/修復被簡單地刪除這一行:
my_factor <- function (x = character(), levels, labels = levels, exclude = NA,
ordered = is.ordered(x), nmax = NA)
{
if (is.null(x))
x <- character()
nx <- names(x)
if (missing(levels)) {
y <- unique(x, nmax = nmax)
ind <- sort.list(y)
y <- as.character(y)
levels <- unique(y[ind])
}
force(ordered)
#exclude <- as.vector(exclude, typeof(x))
x <- as.character(x)
levels <- levels[is.na(match(levels, exclude))]
f <- match(x, levels)
if (!is.null(nx))
names(f) <- nx
nl <- length(labels)
nL <- length(levels)
if (!any(nl == c(1L, nL)))
stop(gettextf("invalid 'labels'; length %d should be 1 or %d",
nl, nL), domain = NA)
levels(f) <- if (nl == nL)
as.character(labels)
else paste0(labels, seq_along(levels))
class(f) <- c(if (ordered) "ordered", "factor")
f
}
讓我們現在測試:
my_factor(foo, exclude = "a")
#[1] <NA> b <NA> c <NA> <NA> c c
#Levels: b c
my_factor(as.character(foo), exclude = "a")
#[1] <NA> b <NA> c <NA> <NA> c c
#Levels: b c
所以基本上它工作時,「x」參數是矢量,但不是當它是因素?如果是,當「x」參數是因素時是否可以排除值? –