R中的一個數據幀子集

我有2個數據幀df2和DF。R中的一個數據幀子集

> DF 
     date tickers 
1 2000-01-01  B 
2 2000-01-01 GOOG 
3 2000-01-01  V 
4 2000-01-01 YHOO 
5 2000-01-02  XOM 

> df2 
     date tickers quantities 
1 2000-01-01  BB   11 
2 2000-01-01  XOM   23 
3 2000-01-01 GOOG   42 
4 2000-01-01 YHOO   21 
5 2000-01-01  V  2112 
6 2000-01-01  B   13 
7 2000-01-02  XOM   24 
8 2000-01-02  BB  422

我需要從df2那些存在於DF的值。這意味着我需要以下的輸出：

3 2000-01-01 GOOG   42 
4 2000-01-01 YHOO   21 
5 2000-01-01  V  2112 
6 2000-01-01  B   13 
7 2000-01-02  XOM   24

所以我用下面的代碼：

> subset(df2,df2$date %in% DF$date & df2$tickers %in% DF$tickers) 
     date tickers quantities 
2 2000-01-01  XOM   23 
3 2000-01-01 GOOG   42 
4 2000-01-01 YHOO   21 
5 2000-01-01  V  2112 
6 2000-01-01  B   13 
7 2000-01-02  XOM   24

但輸出包含一個額外的column.That是因爲ticker「XOM」存在2天在df2。所以兩行都被選中。我的代碼需要進行哪些修改？

的dput如下：

> dput(DF) 
structure(list(date = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("2000-01-01", 
"2000-01-02"), class = "factor"), tickers = structure(c(4L, 5L, 
6L, 8L, 7L), .Label = c("A", "AA", "AAPL", "B", "GOOG", "V", 
"XOM", "YHOO", "Z"), class = "factor")), .Names = c("date", "tickers" 
), row.names = c(NA, -5L), class = "data.frame") 
> dput(df2) 
structure(list(date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L), .Label = c("2000-01-01", "2000-01-02"), class = "factor"), 
    tickers = structure(c(2L, 5L, 3L, 6L, 4L, 1L, 5L, 2L), .Label = c("B", 
    "BB", "GOOG", "V", "XOM", "YHOO"), class = "factor"), quantities = c(11, 
    23, 42, 21, 2112, 13, 24, 422)), .Names = c("date", "tickers", 
"quantities"), row.names = c(NA, -8L), class = "data.frame")

來源

2013-05-06 Dinoop Nair

你想對重複行做什麼？只取一個，將它們相加，將值作爲單獨的列返回......？ – Thomas 2013-05-06 12:14:22

你只是在尋找'merge（DF，df2）'...？這與'sqldf'的答案在下面給出了相同的答案... – 2013-05-06 12:24:24

我認爲merge（）僅適用於具有相同列數的數據幀。這就是爲什麼我問這個問題。感謝您的幫助。 – 2013-05-06 12:30:37

其實並非如此不同from my answer to this post of yours，但需要稍加修改：

df2[duplicated(rbind(DF, df2[,1:2]))[-seq_len(nrow(DF))], ] 

#   date tickers quantities 
# 3 2000-01-01 GOOG   42 
# 4 2000-01-01 YHOO   21 
# 5 2000-01-01  V  2112 
# 6 2000-01-01  B   13 
# 7 2000-01-02  XOM   24

注意：這爲輸出提供了與我們相同的順序的行重新在df2。

替換地，如本說明，使用merge：

merge(df2, DF, by=c("date", "tickers"))

將給出相同的結果，以及（但不一定以相同的順序）。

來源

2013-05-06 12:26:49 Arun

使用sqldf包：

require(sqldf) 

sqldf("SELECT d2.date, d2.tickers, d2.quantities FROM df2 d2 
     JOIN DF d1 ON d1.date=d2.date AND d1.tickers=d2.tickers") 

##  date tickers quantities 
## 1 2000-01-01 GOOG   42 
## 2 2000-01-01 YHOO   21 
## 3 2000-01-01  V  2112 
## 4 2000-01-01  B   13 
## 5 2000-01-02  XOM   24

來源

2013-05-06 12:17:09 Nishanth

R中的一個數據幀子集

回答

相關問題