我有2個數據幀df2
和DF
。R中的一個數據幀子集
> DF
date tickers
1 2000-01-01 B
2 2000-01-01 GOOG
3 2000-01-01 V
4 2000-01-01 YHOO
5 2000-01-02 XOM
> df2
date tickers quantities
1 2000-01-01 BB 11
2 2000-01-01 XOM 23
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
8 2000-01-02 BB 422
我需要從df2
那些存在於DF
的值。這意味着我需要以下的輸出:
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
所以我用下面的代碼:
> subset(df2,df2$date %in% DF$date & df2$tickers %in% DF$tickers)
date tickers quantities
2 2000-01-01 XOM 23
3 2000-01-01 GOOG 42
4 2000-01-01 YHOO 21
5 2000-01-01 V 2112
6 2000-01-01 B 13
7 2000-01-02 XOM 24
但輸出包含一個額外的column.That是因爲ticker
「XOM」存在2天在df2
。所以兩行都被選中。我的代碼需要進行哪些修改?
的dput如下:
> dput(DF)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("2000-01-01",
"2000-01-02"), class = "factor"), tickers = structure(c(4L, 5L,
6L, 8L, 7L), .Label = c("A", "AA", "AAPL", "B", "GOOG", "V",
"XOM", "YHOO", "Z"), class = "factor")), .Names = c("date", "tickers"
), row.names = c(NA, -5L), class = "data.frame")
> dput(df2)
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L), .Label = c("2000-01-01", "2000-01-02"), class = "factor"),
tickers = structure(c(2L, 5L, 3L, 6L, 4L, 1L, 5L, 2L), .Label = c("B",
"BB", "GOOG", "V", "XOM", "YHOO"), class = "factor"), quantities = c(11,
23, 42, 21, 2112, 13, 24, 422)), .Names = c("date", "tickers",
"quantities"), row.names = c(NA, -8L), class = "data.frame")
你想對重複行做什麼?只取一個,將它們相加,將值作爲單獨的列返回......? – Thomas 2013-05-06 12:14:22
你只是在尋找'merge(DF,df2)'...?這與'sqldf'的答案在下面給出了相同的答案... – 2013-05-06 12:24:24
我認爲merge()僅適用於具有相同列數的數據幀。這就是爲什麼我問這個問題。感謝您的幫助。 – 2013-05-06 12:30:37