2016-10-02 76 views
2

我很困惑這個碼應該如何工作:「排除」 的說法()`不工作

foo <- factor(c("a", "b", "a", "c", "a", "a", "c", "c")) 
#[1] a b a c a a c c 
#Levels: a b c 

factor(foo, exclude = "a") 
#[1] a b a c a a c c 
#Levels: a b c 

警告消息:

在as.vector(排除, typeof運算(X)):來港受到脅迫介紹

難道不應該顯示所有aNA替代因素?如果沒有,如何實現這一目標?

+0

所以基本上它工作時,「x」參數是矢量,但不是當它是因素?如果是,當「x」參數是因素時是否可以排除值? –

回答

3

從R-3.4.0開始,此錯誤已得到修復。以下答案現在僅作爲歷史參考。


正如我在我的評論說,此刻exclude只適用於

factor(as.character(foo), exclude = "a") 

而不是

factor(foo, exclude = "a") 

注意,文檔?factor下[R 3.3.1未滿足全部:

exclude: a vector of values to be excluded when forming the set of 
     levels. This should be of the same type as ‘x’, and will be 
     coerced if necessary. 

下列情況不給予任何警告或錯誤,但還沒有做任何事情:

## foo is a factor with `typeof` being "integer" 
factor(foo, exclude = 1L) 
factor(foo, exclude = factor("a", levels = levels(foo))) 
#[1] a b a c a a c c 
#Levels: a b c 

實際上,文件似乎相當矛盾的,因爲它也記載:

The encoding of the vector happens as follows. First all the 
values in ‘exclude’ are removed from ‘levels’. 

所以它看起來像開發者真的期望exclude是一個「角色」。


這更可能是factor內部的一個bug。問題是相當明顯的,即以下內部factor(x, ...)線做混亂時輸入向量x是「因子」類:

exclude <- as.vector(exclude, typeof(x)) 

如在這種情況下是typeof(x)「整數」。如果exclude是一個字符串,則在嘗試將字符串轉換爲整數時會生成NA

我真的不知道爲什麼在factor裏面有這樣一行。隨後的兩行只是在做正確的事情,如果該行不存在:

x <- as.character(x) 
    levels <- levels[is.na(match(levels, exclude))] 

所以,補救/修復被簡單地刪除這一行:

my_factor <- function (x = character(), levels, labels = levels, exclude = NA, 
         ordered = is.ordered(x), nmax = NA) 
{ 
    if (is.null(x)) 
     x <- character() 
    nx <- names(x) 
    if (missing(levels)) { 
     y <- unique(x, nmax = nmax) 
     ind <- sort.list(y) 
     y <- as.character(y) 
     levels <- unique(y[ind]) 
    } 
    force(ordered) 
    #exclude <- as.vector(exclude, typeof(x)) 
    x <- as.character(x) 
    levels <- levels[is.na(match(levels, exclude))] 
    f <- match(x, levels) 
    if (!is.null(nx)) 
     names(f) <- nx 
    nl <- length(labels) 
    nL <- length(levels) 
    if (!any(nl == c(1L, nL))) 
     stop(gettextf("invalid 'labels'; length %d should be 1 or %d", 
      nl, nL), domain = NA) 
    levels(f) <- if (nl == nL) 
     as.character(labels) 
    else paste0(labels, seq_along(levels)) 
    class(f) <- c(if (ordered) "ordered", "factor") 
    f 
} 

讓我們現在測試:

my_factor(foo, exclude = "a") 
#[1] <NA> b <NA> c <NA> <NA> c c 
#Levels: b c 

my_factor(as.character(foo), exclude = "a") 
#[1] <NA> b <NA> c <NA> <NA> c c 
#Levels: b c