我的方式是有效地複製data.table中的行嗎？

我有一個data.table月度數據，而在另一個data.table年度數據，現在我想的年度數據匹配的月度數據各自的觀察。我的方式是有效地複製data.table中的行嗎？

我的做法如下：複製的年度數據每個月，然後加入月報和年報數據。現在我有一個關於行重複的問題。我知道該怎麼做，但我不確定這是否是最好的辦法，所以有些意見會很棒。

這裏是我的年度數據的exemplatory data.table DT我當前如何複製：

library(data.table) 
DT <- data.table(ID = paste(rep(c("a", "b"), each=3), c(1:3, 1:3), sep="_"), 
        values = 10:15, 
        startMonth = seq(from=1, by=2, length=6), 
        endMonth = seq(from=3, by=3, length=6)) 
DT 
     ID values startMonth endMonth 
[1,] a_1  10   1  3 
[2,] a_2  11   3  6 
[3,] a_3  12   5  9 
[4,] b_1  13   7  12 
[5,] b_2  14   9  15 
[6,] b_3  15   11  18 
#1. Alternative 
DT1 <- DT[, list(MONTH=startMonth:endMonth), by="ID"] 
setkey(DT, ID) 
setkey(DT1, ID) 
DT1[DT] 
ID MONTH values startMonth endMonth 
a_1  1  10   1  3 
a_1  2  10   1  3 
a_1  3  10   1  3 
a_2  3  11   3  6 
[...]

最後加入是我想要的東西。然而，DT[, list(MONTH=startMonth:endMonth), by="ID"]已經這樣做了我想要的一切，只是加入了其他列DT，所以我在想，如果我能在我的代碼擺脫了最後三排的，即setkey和join操作。事實證明，你可以，只要做到以下幾點：

#2. Alternative: More intuitiv and just one line of code 
DT[, list(MONTH=startMonth:endMonth, values, startMonth, endMonth), by="ID"] 
ID MONTH values startMonth endMonth 
a_1 1  10   1  3 
a_1 2  10   1  3 
a_1 3  10   1  3 
a_2 3  11   3  6 
...

然而，這只是工作，因爲我硬編碼列名到list表達。在我的真實的數據，我不知道提前所有列的名稱，所以我在想，如果我能告訴data.table返回，我計算如上圖所示，和DT所有其他列的列MONTH。 .SD似乎是能夠做的伎倆，但：

DT[, list(MONTH=startMonth:endMonth, .SD), by="ID"] 
Error in `[.data.table`(DT, , list(YEAR = startMonth:endMonth, .SD), by = "ID") : 
    maxn (4) is not exact multiple of this j column's length (3)

因此，要總結，我知道這是怎麼做了，但我只是想知道這是否是做的最好的方式，因爲我仍然在努力有一點與data.table的語法有關，並且經常在帖子和wiki上讀到，有好的和壞的做事方式。另外，我不明白爲什麼我在使用.SD時出現錯誤。我認爲這只是告訴data.table你想要所有列的簡單方法。我錯過了什麼？

來源

2011-11-04 Christoph_J

偉大的問題。你嘗試的是非常合理的。假設您使用的是v1.7.1，那麼製作list列現在更容易了。在這種情況下，它試圖在.SD（3個項目）旁邊的第2個組的MONTH列中創建一個list列（4個項目）。我會把它作爲一個bug [編輯：現在在v1.7.5中修復]，謝謝。

在此期間，請嘗試：

DT[, cbind(MONTH=startMonth:endMonth, .SD), by="ID"] 
ID MONTH values startMonth endMonth 
a_1  1  10   1  3 
a_1  2  10   1  3 
a_1  3  10   1  3 
a_2  3  11   3  6 
...

而且，只是爲了檢查你見過roll=TRUE？通常情況下，您只有一個startMonth列（不規則，有間隙），然後只加入roll。儘管如此，您的示例數據具有重疊的月份範圍，因此會使其複雜化。

來源

2011-11-04 15:28:21

這是一個功能我寫模仿disaggregate（我需要的東西，處理複雜的數據）。如果它不過分，它可能對你有用。要僅擴展行，請將參數fact設置爲c（1,12），其中12將用於每個「年」行的12個「月」行。

zexpand<-function(inarray, fact=2, interp=FALSE, ...) { 
fact<-as.integer(round(fact)) 
switch(as.character(length(fact)), 
     '1' = xfact<-yfact<-fact, 
     '2'= {xfact<-fact[1]; yfact<-fact[2]}, 
     {xfact<-fact[1]; yfact<-fact[2];warning(' fact is too long. First two values used.')}) 
if (xfact < 1) { stop('fact[1] must be > 0') } 
if (yfact < 1) { stop('fact[2] must be > 0') } 
# new nonloop method, seems to work just ducky 
bigtmp <- matrix(rep(t(inarray), each=xfact), nrow(inarray), ncol(inarray)*xfact, byr=T) 
#does column expansion 
bigx <- t(matrix(rep((bigtmp),each=yfact),ncol(bigtmp),nrow(bigtmp)*yfact,byr=T)) 
return(invisible(bigx)) 
}

來源

2011-11-04 14:13:50

看着這個我意識到，答案是唯一可能的，因爲ID是一個獨特的密鑰（不重複）。這是另一個重複的答案。但順便說一下，一些NA似乎蠕變。這可能是一個錯誤？我正在使用v1.8.7（提交796）。

library(data.table) 
DT <- data.table(x=c(1,1,1,1,2,2,3),y=c(1,1,2,3,1,1,2)) 

DT[,rep:=1L][c(2,7),rep:=c(2L,3L)] # duplicate row 2 and triple row 7 
DT[,num:=1:.N]      # to group each row by itself 

DT 
    x y rep num 
1: 1 1 1 1 
2: 1 1 2 2 
3: 1 2 1 3 
4: 1 3 1 4 
5: 2 1 1 5 
6: 2 1 1 6 
7: 3 2 3 7 

DT[,cbind(.SD,dup=1:rep),by="num"] 
    num x y rep dup 
1: 1 1 1 1 1 
2: 2 1 1 1 NA  # why these NA? 
3: 2 1 1 2 NA 
4: 3 1 2 1 1 
5: 4 1 3 1 1 
6: 5 2 1 1 1 
7: 6 2 1 1 1 
8: 7 3 2 3 1 
9: 7 3 2 3 2 
10: 7 3 2 3 3

只是爲了保持完整性，更快的方式是rep行號，然後採取一步到位的子集（不分組，沒有用的cbind或.SD）：

DT[rep(num,rep)] 
    x y rep num 
1: 1 1 1 1 
2: 1 1 2 2 
3: 1 1 2 2 
4: 1 2 1 3 
5: 1 3 1 4 
6: 2 1 1 5 
7: 2 1 1 6 
8: 3 2 3 7 
9: 3 2 3 7 
10: 3 2 3 7

在這個例子中，列rep的數據恰好與rep()基本函數名稱相同。

來源

2013-01-16 13:35:45 statquant

感謝。我跑它（v1.8.7），但我沒有看到'NA'。你有哪個版本？ –

謝謝。我仍然看不到'NA'，但現在我得到兩個相同的警告：'1：rep：數字表達式有2個元素：只有第一個使用' –

嘗試最新（796）作爲第一步，然後，請排除。 –

-2

做的最快和最簡潔的方式：我們還可以通過組枚舉由

DT[rep(1:nrow(DT), endMonth - startMonth)]

：

dd <- DT[rep(1:nrow(DT), endMonth - startMonth)] 
dd[, nn := 1:.N, by = ID] 
dd

來源

2016-12-01 23:52:23 user7238835

我們還可以按組進行枚舉： dd = DT [rep（1：nrow（DT），endMonth-startMonth）] – user7238835

我們還可以按組進行枚舉： 'code' dd = DT [rep（1： nrow（DT），endMonth-startMonth）] dd [，nn：= 1 :.N，by = ID] 'code' – user7238835

請編輯您的答案，不要填寫評論部分。使用上面的_edit_鏈接。 – Marcs

我的方式是有效地複製data.table中的行嗎？

回答

相關問題