2014-10-17 45 views
2

我真的需要就如何解決問題的幫助。我有一個數據集,看起來像這樣..選擇與分組雙類別數據從數據幀

Name Sex Total  
Anna F  10 
Jamie M  2 
Jamie F  7 
Mike M  13 
Sam F  6 
Sam M  3 

structure(list(Name = c("Anna", "Jamie", "Jamie", "Mike", "Sam", "Sam"), 
Sex = c("F", "M", "F", "M", "F", "M"), Total = c(10L, 2L, 7L, 13L, 6L, 3L)), 
.Names = c("Name", "Sex", "Total"), class = "data.frame", row.names = c(NA, -6L)) 

我想要做的就是讓那些男性和女性名字的名字,所以結果看起來就像..

Name Sex Total 
Jamie M 2 
Jamie F 7 
Sam M 3 
Sam F 6 

但我真的很難接近它。

+0

類似,但多一點涉及的任務[這裏](http://stackoverflow.com/questions/26347343/group-androgynous-names-and-sum-amount-for-each-year-in-a-data-frame-in-r)。 – ilir 2014-10-17 20:24:46

回答

2

您可以使用ave計算不同性別的每個名字的數量,只有那些子集那些具有兩種性別。例如室內用樣本數據

dd<-structure(list(Name = c("Anna", "Jamie", "Jamie", "Mike", "Sam", "Sam"), 
Sex = c("F", "M", "F", "M", "F", "M"), Total = c(10L, 2L, 7L, 13L, 6L, 3L)), 
.Names = c("Name", "Sex", "Total"), class = "data.frame", row.names = c(NA, -6L)) 

你可以做

both<-with(dd, ave(Sex, Name, FUN=function(x) length(unique(x))))=="2" 
dd[both, ] 

得到

Name Sex Total 
2 Jamie M  2 
3 Jamie F  7 
5 Sam F  6 
6 Sam M  3 

達到目標。

5

這是我會怎麼處理它假設數據存儲在d

# get a vector (set) of names that are use by both M and F 
dual.names <- intersect(d$Name[d$Sex=='M'], d$Name[d$Sex=='F']) 

# use set of dual names to filter data 
d[d$Name %in% dual.names, ] 
4

強制性Hadleyverse(dplyr & tidyr)答案:

library(tidyr) 
library(dplyr) 

dat %>% 
    spread(Sex, Total) %>% 
    filter(!is.na(M), !is.na(F)) %>% 
    gather(Sex, Total, M, F) %>% 
    arrange(Name) 

## Name Sex Total 
## 1 Jamie M  2 
## 2 Jamie F  7 
## 3 Sam M  3 
## 4 Sam F  6 

編輯MUCH更好dplyr通過@konvas的方法'評論:

dat %>% group_by(Name) %>% filter(length(unique(Sex)) == 2) 

編輯,並通過@大衛的意見進一步完善:

dat %>% group_by(Name) %>% filter(n_distinct(Sex) == 2) 

(我可以換乘點@konvas & @大衛? :-)

+2

我想你能避免傳播和收集這樣的'd%>%GROUP_BY(名稱)%>%的過濾器(長度(唯一的(性別))== 2)' – konvas 2014-10-17 20:18:33

+0

@konvas這是'dplyr',擴散和聚集是一半的樂趣 – ilir 2014-10-17 20:20:00

+2

@ilir哈哈不夠公平:) ..但如果OP的數據集較大,有顯著的速度提升 – konvas 2014-10-17 20:20:45

2

加入晚了一點,但這裏有一個data.table方法

library(data.table) 
setDT(df)[ , .SD[length(unique(Sex)) == 2], by = Name] 
##  Name Sex Total 
## 1: Jamie M  2 
## 2: Jamie F  7 
## 3: Sam F  6 
## 4: Sam M  3 

或者,如果你沒有重複,這裏有一個更快的解決方案

setDT(df)[ , .SD[.N == 2], by = Name]