使用：=在data.table中按組分配多列

使用data.table分配到多列的最佳方式是什麼？例如：使用：=在data.table中按組分配多列

f <- function(x) {c("hi", "hello")} 
x <- data.table(id = 1:10)

我願做這樣的事情（當然這個語法是不正確的）：

x[ , (col1, col2) := f(), by = "id]

，並延長我可能有名稱的列在一個變量（比如column_names），我想這樣做：

x[ , col_names := another_f(), by = "id", with = FALSE]

什麼是做這樣的事情的正確方法？

來源

2012-07-27 Alex

這看起來已經回答了： http://stackoverflow.com/questions/11308754/add-multiple-columns-to-r-data-一體式功能電話 – Alex 2012-07-27 20:52:19

Alex，這個答案很接近，但它似乎不能與'by'結合使用，因爲@Christoph_J可以說是正確的。添加到您的問題的鏈接添加到[FR＃2120]（https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2120&group_id=240&atid=978）「LHS需要加上= FALSE ：=「，所以它不會被忘記重新訪問。 – 2012-08-08 15:29:36

106

這現在可以在R-Forge的v1.8.3上使用。感謝您的突出！

x <- data.table(a = 1:3, b = 1:6) 
f <- function(x) {list("hi", "hello")} 
x[ , c("col1", "col2") := f(), by = a][] 
# a b col1 col2 
# 1: 1 1 hi hello 
# 2: 2 2 hi hello 
# 3: 3 3 hi hello 
# 4: 1 4 hi hello 
# 5: 2 5 hi hello 
# 6: 3 6 hi hello 

x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][] 
# a b col1 col2 mean sum 
# 1: 1 1 hi hello 2.5 5 
# 2: 2 2 hi hello 3.5 7 
# 3: 3 3 hi hello 4.5 9 
# 4: 1 4 hi hello 2.5 5 
# 5: 2 5 hi hello 3.5 7 
# 6: 3 6 hi hello 4.5 9 

mynames = c("Name1", "Longer%") 
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a] 
#  a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27

x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same 
# a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27 

x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][] # same 
# a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27 

x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][] # same 
# a b col1 col2 mean sum Name1 Longer% 
# 1: 1 1 hi hello 2.5 5 10  15 
# 2: 2 2 hi hello 3.5 7 14  21 
# 3: 3 3 hi hello 4.5 9 18  27 
# 4: 1 4 hi hello 2.5 5 10  15 
# 5: 2 5 hi hello 3.5 7 14  21 
# 6: 3 6 hi hello 4.5 9 18  27

來源

2012-10-06 08:48:38

感謝您的答案和例子。我應該如何修改下面一行，以便從dim輸出中獲取每個objectName的兩列，而不是一行兩列？ 'data.table（objectName = ls（））[，c（「rows」，「cols」）：= dim（get（objectName）），by = objectName]（我使用'data.table' 1.8。 11） – dnlbrky 2014-05-19 02:00:29

@dnlbrky'dim'返回一個向量，所以將其轉換爲'list'類型應該旋轉它;例如'[C（「行」，「COLS」）：= as.list（暗（得到（對象名））），通過= objectNa我]'。麻煩的是，'as.list'具有調用開銷並且還複製小向量。如果效率問題隨着羣體數量的增加而出現，請告訴我們。 – 2014-05-21 11:49:48

謝謝@Matt_Dowle。我嘗試過'list'，但不是'as.list'。速度不是問題。只是想快速找到具有特定列數或行數的環境中的對象。這是脫離主題，但是......你怎麼看待將NCOL添加到'tables（）'？ – dnlbrky 2014-05-23 00:34:20

使用：=在data.table中按組分配多列

回答

相關問題