2010-11-16 88 views
5

我見過有關將csv文件合併到一個數據幀中的一些questions。如果數據幀已經在工作區中,該怎麼辦?我有五個寬闊的動物園作爲數據框架,然後融化。這裏有一個頭:在兩個公共列上合併幾個數據幀

> head(df.mon.ssf.ret) 
     date variable value 
1 2009.000  AA1C NA 
2 2009.083  AA1C NA 
3 2009.167  AA1C NA 
4 2009.250  AA1C NA 
5 2009.333  AA1C NA 
6 2009.417  AA1C NA 

我可以在「日期」和「變量」用一系列嵌套合併的合併這些,但似乎笨拙。有更多的編程方式來合併嗎?

如果我確信所有動物園中的柱子的排列順序相同,我是否可以確信融化會保持訂購和使用cbind?謝謝!

更新:

有我丟失的東西對熔體​​的使用理念。這裏是發生了什麼,當我作爲合併動物園和熔體作爲使用三個動物園的一個很寬的數據幀:

> temp <- merge(z.ssf.oi, z.ssf.oig, z.ssf.ret) 
> class(temp) 
[1] "zoo" 
> temp2 <- cbind(index(temp), as.data.frame(temp)) 
> class(temp2) 
[1] "data.frame" 
> names(temp2)[1] <- "date" 
> dim(temp2) 
[1] 12 1204 
> temp3 <- melt(temp2, id="date") 
Error in data.frame(ids, variable, value) : 
    arguments imply differing number of rows: 12, 14436 
> head(temp2)[, 1:5] 
      date AA1C.z.ssf.oi AAPL1C.z.ssf.oi ABT1C.z.ssf.oi ABX1C.z.ssf.oi 
Jan 2009 Jan 2009  1895.800  49191.25    NA    NA 
Feb 2009 Feb 2009  1415.579  42650.26    NA  6267.96 
Mar 2009 Mar 2009  1501.398  36712.20    NA  11581.65 
Apr 2009 Apr 2009  1752.936  74376.27    NA  12168.29 
May 2009 May 2009  1942.874  96307.30    NA  13490.60 
Jun 2009 Jun 2009   NA  79170.70    NA  16337.21 

更新2:感謝您的幫助!這裏是一個非常手動解決方案

> A <- cbind(index(z.ssf.oi), as.data.frame(z.ssf.oi)) 
> names(A)[1] <- "date" 
> B <- cbind(index(z.ssf.oig), as.data.frame(z.ssf.oig)) 
> names(B)[1] <- "date" 
> C <- cbind(index(z.ssf.ret), as.data.frame(z.ssf.ret)) 
> names(C)[1] <- "date" 
> A.melt <- melt(A, id="date") 
> head(A.melt) 
     date variable value 
1 Jan 2009  A1C NA 
2 Feb 2009  A1C NA 
3 Mar 2009  A1C NA 
4 Apr 2009  A1C NA 
5 May 2009  A1C NA 
6 Jun 2009  A1C NA 
> B.melt <- melt(B, id="date") 
> C.melt <- melt(C, id="date") 
> ans <- merge(merge(A.melt, B.melt, by=c("date", "variable")), C.melt, by=c("date", "variable")) 
> names(ans)[3:5] <- c("oi", "oig", "ret") 
> head(ans) 
     date variable  oi  oig   ret 
1 Apr 2009  A1C  NA  NA   NA 
2 Apr 2009  AA1C  NA  NA   NA 
3 Apr 2009 AAPL1C 59316.88 0.3375786 0.008600073 
4 Apr 2009 ABB1C  NA  NA   NA 
5 Apr 2009 ABT1C  NA  NA   NA 
6 Apr 2009 ABX1C  NA  NA   NA 

(和NA從家裏一個不完整的數據集,需要從我的數據庫中篩選撥號)

更新3:這裏有一些dputs(我把[ 1:10,1:10]每個寬動物園的子集並轉換爲數據幀)

> dput(A) 
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), class = "factor", .Label = "oi"), date = structure(c(2009, 
2009.08333333333, 2009.16666666667, 2009.25, 2009.33333333333, 
2009.41666666667, 2009.5, 2009.58333333333, 2009.66666666667, 
2009.75), class = "yearmon"), AA1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), AAPL1C = c(49226.391, 42662.1589473684, 35354.4254545455, 
57161.6495238095, 84362.895, NA, NA, 47011.8519047619, 57852.2171428571, 
33058.0090909091), ABT1C = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
    ABX1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ACE1C = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), ACI1C = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), ACS1C = c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_), ADBE1C = c(NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
    ), ADCT1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ADI1C = c(NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_)), .Names = c("group", "date", 
"AA1C", "AAPL1C", "ABT1C", "ABX1C", "ACE1C", "ACI1C", "ACS1C", 
"ADBE1C", "ADCT1C", "ADI1C"), row.names = c("Jan 2009", "Feb 2009", 
"Mar 2009", "Apr 2009", "May 2009", "Jun 2009", "Jul 2009", "Aug 2009", 
"Sep 2009", "Oct 2009"), class = "data.frame") 
> dput(B) 
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), class = "factor", .Label = "oig"), date = structure(c(2009.08333333333, 
2009.16666666667, 2009.25, 2009.33333333333, 2009.41666666667, 
2009.5, 2009.58333333333, 2009.66666666667, 2009.75, 2009.83333333333 
), class = "yearmon"), AA1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), AAPL1C = c(-0.143117562125788, -0.187888745830302, 0.480459636485712, 
0.389244461579155, NA, NA, NA, 0.207492040517069, -0.559627909130612, 
NA), ABT1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ABX1C = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), ACE1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), ACI1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ACS1C = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), ADBE1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), ADCT1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ADI1C = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_)), .Names = c("group", "date", "AA1C", "AAPL1C", 
"ABT1C", "ABX1C", "ACE1C", "ACI1C", "ACS1C", "ADBE1C", "ADCT1C", 
"ADI1C"), row.names = c("Feb 2009", "Mar 2009", "Apr 2009", "May 2009", 
"Jun 2009", "Jul 2009", "Aug 2009", "Sep 2009", "Oct 2009", "Nov 2009" 
), class = "data.frame") 
> dput(C) 
structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), class = "factor", .Label = "ret"), date = structure(c(2009, 
2009.08333333333, 2009.16666666667, 2009.25, 2009.33333333333, 
2009.41666666667, 2009.5, 2009.58333333333, 2009.66666666667, 
2009.75), class = "yearmon"), AA1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), AAPL1C = c(-0.143117562125788, -0.187888745830302, 0.480459636485712, 
0.389244461579155, NA, NA, NA, 0.207492040517069, -0.559627909130612, 
NA), ABT1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ABX1C = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), ACE1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), ACI1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ACS1C = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), ADBE1C = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_ 
), ADCT1C = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), ADI1C = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_)), .Names = c("group", "date", "AA1C", "AAPL1C", 
"ABT1C", "ABX1C", "ACE1C", "ACI1C", "ACS1C", "ADBE1C", "ADCT1C", 
"ADI1C"), row.names = c("Feb 2009", "Mar 2009", "Apr 2009", "May 2009", 
"Jun 2009", "Jul 2009", "Aug 2009", "Sep 2009", "Oct 2009", "Nov 2009" 
), class = "data.frame") 
+2

你可以首先合併寬動物園對象('merge.zoo'接受兩個以上的對象),然後重塑? – 2010-11-16 22:51:40

+1

您是否可以提供一個示例,說明在完成整形/合併操作後,您希望數據看起來像什麼? – 2010-11-17 04:05:26

回答

6

你可以試試這個。未經測試,因爲您的示例不可重現。如果您想要更好的答案,請給我們一些z.sfff.oi,z.sff.oig和z.sff.ret的虛擬數據。您可以使用dput()爲可重現數據集生成代碼。

A <- data.frame(Group = "oi", date = as.factor(index(z.ssf.oi),) as.data.frame(z.ssf.oi))) 
B <- data.frame(Group = "oig", date = as.factor(index(z.ssf.oig)), as.data.frame(z.ssf.oig))) 
C <- data.frame(Group = "ret", date = as.factor(index(z.ssf.ret)), as.data.frame(z.ssf.ret))) 
Long <- melt(rbind(A, B, C), id.vars = c("Group", "date"))) 
cast(date ~ Group, data = Long) 
+0

謝謝!我需要開始這樣思考!我仍然會得到相同的警告(我會在上面修改一小段子集):'data.frame(ids,variable,value)中的錯誤:參數意味着行數不同:30,300' – 2010-11-17 12:28:29

+0

我應該添加調用以獲得該錯誤是'long < - melt(temp,id.vars = c(「group」,「date」))'其中temp是三個數據幀的綁定。 – 2010-11-17 12:34:36

+0

你的想法是對的!經過一些修補之後,問題是我從index()獲得的yearmon類。所以我用as.factor()來包裝它,它的功能非常好!謝謝您的幫助!請將as.factor()或as.character()添加到上面的解決方案中,以便後人使用。 – 2010-11-17 14:39:25