INNER JOIN MAX條件類型

我有兩個dataframes：INNER JOIN MAX條件類型

info 
Fname Lname 
Henry  H 
Rose  R 
Jacob  T 
John  O 
Fred  Y 
Simon  S 
    Gay  T

而且

students 
Fname Lname Age Height Subject Result 
Henry  H 12  15 Math;Sci  P 
Rose  R 11  18 Math;Sci  P 
Jacob  T 11  15 Math;Sci  P 
Henry  H 11  14 Math;Sci  P 
John  O 12  13 Math;Sci  P 
John  O 13  16 Math;Sci  F 
Fred  Y 11  16  Sci  P 
Simon  S 12  10 Eng;Math  P 
    Gay  T 12  11 Math;Sci  F 
Rose  R 15  18 Math;Sci  P 
Fred  Y 12  16 Math;Sci  P

我想要做一個JOIN，並從信息所有的名字，找到學生及其相關的元數據。但只挑選最高年齡（當Fname和LName相等時）。我的輸出應該是這樣的：

Final 
Fname Lname Age Height Subject Result 
Henry  H 12  15 Math;Sci  P 
Rose  R 15  18 Math;Sci  P 
Jacob  T 11  15 Math;Sci  P 
John  O 13  16 Math;Sci  F 
Fred  Y 12  16 Math;Sci  P 
Simon  S 12  10 Eng;Math  P 
    Gay  T 12  11 Math;Sci  F

我試圖sqldf，但沒有運氣呢。我只是無法正確獲取標識符。有沒有其他方法可以得到我的輸出？

來源

2015-10-17 MaxPD

[如何連接（合併）數據幀（內部，外部，左，右）？]（http://stackoverflow.com/questions/1299871/how-to-join-merge-數據幀 - 內部 - 外部 - 左 - 右） – WoodChopper

這是一個可能不太優雅的方式，使用基地R。

現在，將這些幀合併到名稱上（儘管在這個例子中這裏沒有這麼做，它實際上只是一個已經在students幀中的名稱列表）。

merged_df <- merge(students,info,by=c("Fname","Lname"))

最後，聚合，這裏只是名稱。您可以添加任何分類或因子變量。

merged_df_max <-aggregate(
       merged_df[c('Age','Height')], 
       by=list(Fname = merged_df$Fname, 
         Lname = merged_df$Lname), 
       FUN=max, na.rm=TRUE) 

## add back details to the merged df 
result <- merge(merged_df_max,students,by=c("Fname","Lname","Age","Height"))

從文件創建data.frame，

## load data 
lines <-" 
Fname,Lname,Age,Height,Subject,Result 
Henry,H,12,15,Math;Sci,P 
Rose,R,11,18,Math;Sci,P 
Jacob,T,11,15,Math;Sci,P 
Henry,H,11,14,Math;Sci,P 
John,O,12,13,Math;Sci,P 
John,O,13,16,Math;Sci,F 
Fred,Y,11,16,Sci,P 
Simon,S,12,10,Eng;Math,P 
Gay,T,12,11,Math;Sci,F 
Rose,R,15,18,Math;Sci,P 
Fred,Y,12,16,Math;Sci,P 
" 

lines2 <-" 
Fname,Lname 
Henry,H 
Rose,R 
Jacob,T 
John,O 
Fred,Y 
Simon,S 
Gay,T 
" 

con <- textConnection(lines) 
students <- read.csv(con,sep=',') 
con2 <- textConnection(lines2) 
info <- read.csv(con2,sep=',') 
close(con) 
close(con2)

來源

2015-10-17 06:31:48 ako

請不要使用'attach'。 – 2015-10-17 07:15:18

@帕斯卡對不起！不會再發生。 – ako

感謝您的幫助。 – MaxPD

使用dplyr：

library(dplyr) 

info %>% left_join(students) %>% 
    group_by(Fname, Lname) %>% 
    filter(Age == max(Age))

來源

2015-10-17 08:11:59

試試這個：

library(sqldf) 
sqldf("select Fname, Lname, max(Age) Age, Height, Subject, Result 
     from info left join students using (Fname, Lname) 
     group by Fname, Lname")

我們用左的情況下，加入有是的學生0在students中沒有數據。在這個問題中，info和students中的學生是相同的，所以我們可以在查詢中省略left這個詞，並且仍然得到相同的結果。另請注意，因爲同一組學生出現在info和students中，我們完全不需要使用info。這是一樣的除了from行的最後一個查詢，但給人提供的數據顯示了相同的答案：

sqldf("select Fname, Lname, max(Age) Age, Height, Subject, Result 
     from students 
     group by Fname, Lname")

注：對於重複性以下構建info和student數據幀。請在SO上提問時自己提供。

Lines_info <- " 
Fname Lname 
Henry  H 
Rose  R 
Jacob  T 
John  O 
Fred  Y 
Simon  S 
    Gay  T 
" 
Lines_students <- " 
Fname Lname Age Height Subject Result 
Henry  H 12  15 Math;Sci  P 
Rose  R 11  18 Math;Sci  P 
Jacob  T 11  15 Math;Sci  P 
Henry  H 11  14 Math;Sci  P 
John  O 12  13 Math;Sci  P 
John  O 13  16 Math;Sci  F 
Fred  Y 11  16  Sci  P 
Simon  S 12  10 Eng;Math  P 
    Gay  T 12  11 Math;Sci  F 
Rose  R 15  18 Math;Sci  P 
Fred  Y 12  16 Math;Sci  P 
" 

info <- read.table(text = Lines_info, header = TRUE) 
students <- read.table(text = Lines_students, header = TRUE)

來源

2015-10-17 11:12:46

非常感謝。我會記住你的建議。 – MaxPD

INNER JOIN MAX條件類型

回答

相關問題