2012-07-27 60 views
88

使用data.table分配到多列的最佳方式是什麼?例如:使用:=在data.table中按組分配多列

f <- function(x) {c("hi", "hello")} 
x <- data.table(id = 1:10) 

我願做這樣的事情(當然這個語法是不正確的):

x[ , (col1, col2) := f(), by = "id] 

,並延長我可能有名稱的列在一個變量(比如column_names),我想這樣做:

x[ , col_names := another_f(), by = "id", with = FALSE] 

什麼是做這樣的事情的正確方法?

+1

這看起來已經回答了: http://stackoverflow.com/questions/11308754/add-multiple-columns-to-r-data-一體式功能電話 – Alex 2012-07-27 20:52:19

+0

Alex,這個答案很接近,但它似乎不能與'by'結合使用,因爲@Christoph_J可以說是正確的。添加到您的問題的鏈接添加到[FR#2120](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2120&group_id=240&atid=978)「LHS需要加上= FALSE :=「,所以它不會被忘記重新訪問。 – 2012-08-08 15:29:36

回答

106

這現在可以在R-Forge的v1.8.3上使用。感謝您的突出!

x <- data.table(a = 1:3, b = 1:6) 
f <- function(x) {list("hi", "hello")} 
x[ , c("col1", "col2") := f(), by = a][] 
# a b col1 col2 
# 1: 1 1 hi hello 
# 2: 2 2 hi hello 
# 3: 3 3 hi hello 
# 4: 1 4 hi hello 
# 5: 2 5 hi hello 
# 6: 3 6 hi hello 

x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][] 
# a b col1 col2 mean sum 
# 1: 1 1 hi hello 2.5 5 
# 2: 2 2 hi hello 3.5 7 
# 3: 3 3 hi hello 4.5 9 
# 4: 1 4 hi hello 2.5 5 
# 5: 2 5 hi hello 3.5 7 
# 6: 3 6 hi hello 4.5 9 

mynames = c("Name1", "Longer%") 
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a] 
#  a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27 


x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same 
# a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27 

x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][] # same 
# a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27 

x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][] # same 
# a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27 
+0

感謝您的答案和例子。我應該如何修改下面一行,以便從dim輸出中獲取每個objectName的兩列,而不是一行兩列? 'data.table(objectName = ls())[,c(「rows」,「cols」):= dim(get(objectName)),by = objectName](我使用'data.table' 1.8。 11) – dnlbrky 2014-05-19 02:00:29

+0

@dnlbrky'dim'返回一個向量,所以將其轉換爲'list'類型應該旋轉它;例如'[C( 「行」, 「COLS」):= as.list(暗(得到(對象名))),通過= objectNa我]'。麻煩的是,'as.list'具有調用開銷並且還複製小向量。如果效率問題隨着羣體數量的增加而出現,請告訴我們。 – 2014-05-21 11:49:48

+0

謝謝@Matt_Dowle。我嘗試過'list',但不是'as.list'。速度不是問題。只是想快速找到具有特定列數或行數的環境中的對象。這是脫離主題,但是......你怎麼看待將NCOL添加到'tables()'? – dnlbrky 2014-05-23 00:34:20