2013-05-06 128 views
1

我有2個數據幀df2DFR中的一個數據幀子集

> DF 
     date tickers 
1 2000-01-01  B 
2 2000-01-01 GOOG 
3 2000-01-01  V 
4 2000-01-01 YHOO 
5 2000-01-02  XOM 

> df2 
     date tickers quantities 
1 2000-01-01  BB   11 
2 2000-01-01  XOM   23 
3 2000-01-01 GOOG   42 
4 2000-01-01 YHOO   21 
5 2000-01-01  V  2112 
6 2000-01-01  B   13 
7 2000-01-02  XOM   24 
8 2000-01-02  BB  422 

我需要從df2那些存在於DF的值。這意味着我需要以下的輸出:

3 2000-01-01 GOOG   42 
4 2000-01-01 YHOO   21 
5 2000-01-01  V  2112 
6 2000-01-01  B   13 
7 2000-01-02  XOM   24 

所以我用下面的代碼:

> subset(df2,df2$date %in% DF$date & df2$tickers %in% DF$tickers) 
     date tickers quantities 
2 2000-01-01  XOM   23 
3 2000-01-01 GOOG   42 
4 2000-01-01 YHOO   21 
5 2000-01-01  V  2112 
6 2000-01-01  B   13 
7 2000-01-02  XOM   24 

但輸出包含一個額外的column.That是因爲ticker「XOM」存在2天在df2。所以兩行都被選中。我的代碼需要進行哪些修改?

的dput如下:

> dput(DF) 
structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("2000-01-01", 
"2000-01-02"), class = "factor"), tickers = structure(c(4L, 5L, 
6L, 8L, 7L), .Label = c("A", "AA", "AAPL", "B", "GOOG", "V", 
"XOM", "YHOO", "Z"), class = "factor")), .Names = c("date", "tickers" 
), row.names = c(NA, -5L), class = "data.frame") 
> dput(df2) 
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L), .Label = c("2000-01-01", "2000-01-02"), class = "factor"), 
    tickers = structure(c(2L, 5L, 3L, 6L, 4L, 1L, 5L, 2L), .Label = c("B", 
    "BB", "GOOG", "V", "XOM", "YHOO"), class = "factor"), quantities = c(11, 
    23, 42, 21, 2112, 13, 24, 422)), .Names = c("date", "tickers", 
"quantities"), row.names = c(NA, -8L), class = "data.frame") 
+0

你想對重複行做什麼?只取一個,將它們相加,將值作爲單獨的列返回......? – Thomas 2013-05-06 12:14:22

+1

你只是在尋找'merge(DF,df2)'...?這與'sqldf'的答案在下面給出了相同的答案... – 2013-05-06 12:24:24

+0

我認爲merge()僅適用於具有相同列數的數據幀。這就是爲什麼我問這個問題。感謝您的幫助。 – 2013-05-06 12:30:37

回答

1

其實並非如此不同from my answer to this post of yours,但需要稍加修改:

df2[duplicated(rbind(DF, df2[,1:2]))[-seq_len(nrow(DF))], ] 

#   date tickers quantities 
# 3 2000-01-01 GOOG   42 
# 4 2000-01-01 YHOO   21 
# 5 2000-01-01  V  2112 
# 6 2000-01-01  B   13 
# 7 2000-01-02  XOM   24 

注意:這爲輸出提供了與我們相同的順序的行重新在df2


替換地,如本說明,使用merge

merge(df2, DF, by=c("date", "tickers")) 

將給出相同的結果,以及(但不一定以相同的順序)。

3

使用sqldf包:

require(sqldf) 

sqldf("SELECT d2.date, d2.tickers, d2.quantities FROM df2 d2 
     JOIN DF d1 ON d1.date=d2.date AND d1.tickers=d2.tickers") 

##  date tickers quantities 
## 1 2000-01-01 GOOG   42 
## 2 2000-01-01 YHOO   21 
## 3 2000-01-01  V  2112 
## 4 2000-01-01  B   13 
## 5 2000-01-02  XOM   24