在R中加入兩列的因子級別

我有兩列數據和相同類型的數據（字符串）。在R中加入兩列的因子級別

我想加入列的級別。即。我們有：

col1 col2 
Bob John 
Tom Bob 
Frank Jane 
Jim Bob 
Tom Bob 
... ... (and so on)

現在

col1中有4個級別（鮑勃，湯姆·弗蘭克，吉姆）和col2上有3個級別（約翰·簡，鮑勃）

但我想都列有所有的因子水平（鮑勃，湯姆，弗蘭克，吉姆，簡，約翰），作爲後來一個唯一的ID代替每個「名字」的，使得最後的結果將是：

是鮑勃 - > 1 ，Tom - > 2等。

任何想法:)？

編輯：感謝所有的美妙的答案！你們都是真棒，據我所知:)

來源

2011-01-31 abcde123483

你要的因素包括兩列所有唯一的名稱。

col1 <- factor(c("Bob", "Tom", "Frank", "Jim", "Tom")) 
col2 <- factor(c("John", "Bob", "Jane", "Bob", "Bob")) 
mynames <- unique(c(levels(col1), levels(col2))) 
fcol1 <- factor(col1, levels = mynames) 
fcol2 <- factor(col2, levels = mynames)

編輯：如果更換這個第三行更好一點：

mynames <- union(levels(col1), levels(col2))

來源

2011-01-31 20:29:19

敢發誓這個當我在寫下面的憎惡沒有工作，但現在這樣：

## self contained example: 
txt <- "col1 col2 
Bob John 
Tom Bob 
Frank Jane 
Jim Bob 
Tom Bob" 
dat <- read.table(textConnection(txt), header = TRUE)

只是計算組獨特的水平和強迫每個colX的因素：

> dat3 <- dat 
> lev <- as.character(unique(unlist(sapply(dat, levels)))) 
> dat3 <- within(dat3, col1 <- factor(col1, levels = lev)) 
> dat3 <- within(dat3, col2 <- factor(col2, levels = lev)) 
> str(dat3) 
'data.frame': 5 obs. of 2 variables: 
$ col1: Factor w/ 6 levels "Bob","Tom","Frank",..: 1 2 3 4 2 
$ col2: Factor w/ 6 levels "Bob","Tom","Frank",..: 5 1 6 1 1 
> data.matrix(dat3) 
    col1 col2 
[1,] 1 5 
[2,] 2 1 
[3,] 3 6 
[4,] 4 1 
[5,] 2 1

[原文：展示如何愚蠢複雜和模糊的人可以寫R代碼時有人試圖真的很難] 不知道這是特別ELEG螞蟻（它不是），而是......

我們先不公開的數據：

tmp <- unlist(dat)

然後計算出獨特的水平

lev <- as.character(unique(tmp))

，然後重組tmp（從上面）返回到與原始數據相同的維度，轉換爲data.frame（保留字符串），放在這個數據幀上，創建一個上面計算出的級別lev的因子，最後強制轉換爲數據幀。

dat2 <- data.frame(lapply(data.frame(matrix(tmp, ncol = ncol(dat)), 
            stringsAsFactors = FALSE), 
          FUN = factor, levels = lev))

其中給出：

> dat2 
    X1 X2 
1 Bob John 
2 Tom Bob 
3 Frank Jane 
4 Jim Bob 
5 Tom Bob 
> sapply(dat2, levels) 
    X1  X2  
[1,] "Bob" "Bob" 
[2,] "Tom" "Tom" 
[3,] "Frank" "Frank" 
[4,] "Jim" "Jim" 
[5,] "John" "John" 
[6,] "Jane" "Jane" 
> data.matrix(dat2) 
    X1 X2 
[1,] 1 5 
[2,] 2 1 
[3,] 3 6 
[4,] 4 1 
[5,] 2 1

來源

2011-01-31 20:12:18

x <- structure(list(col1 = structure(c(1L, 4L, 2L, 3L, 4L), .Label = c("Bob", "Frank", "Jim", "Tom"), class = "factor"), col2 = structure(c(3L, 1L, 2L, 1L, 1L), .Label = c("Bob", "Jane", "John"), class = "factor")), .Names = c("col1", "col2"), class = "data.frame", row.names = c(NA, -5L))

讓因素名稱的簡單聯合：

both <- union(levels(x$col1), levels(x$col2))

而且relevel兩個因素：

x$col1 <- factor(x$col1, levels=both) 
x$col2 <- factor(x$col2, levels=both)

編輯完成後：添加例如從因素使數值

你可以簡單地轉換因子水平爲數值，例如：

as.numeric(x$col1)

或基於@Gavin辛普森的提示下面一步到位更簡單，更好的解決方案：

data.matrix(x)

來源

2011-01-31 20:12:53 daroczig

整潔，乾淨及快捷。好的。恕我直言更好回答@Gavin張貼，雖然我寧願`data.frame（lapply（...`解決方案，出於純粹的懶惰。 – aL3xa 2011-01-31 20:31:36

在R中加入兩列的因子級別

回答

相關問題