2017-02-28 34 views
0

我想同時在兩列上創建數據調節的子集。通過兩列連接進行子集化

類似於此: subsetting data using multiple variables in R

例如:

說我有這個叫Gamedat數據集:

 Games People Hoursplayed 
    goldeneye Michael   5 
    goldeneye Thatcher   8 
    goldeneye Dexter   12 
    goldeneye Dexter   15 
     pacman Dexter   2 
     tetris  Clint   5 
     tetris Dexter   8 
    goldeneye Thatcher   12 
     pacman Thatcher   15 
    goldeneye  Clint   2 
     pacman Michael   5 
     pacman Michael   8 
     pacman  Clint   12 
     tetris  John   15 
     tetris  Clint   2 
ageofempires  Clint   5 
     pacman Dexter   8 
ageofempires Thatcher   12 
ageofempires  John   15 
    goldeneye Dexter   2 

說我想看看像鵲遊戲。我想看看任何玩家玩過其他遊戲的時間與他們玩過goldeneye的時間相同(這在我的真實數據集中更有用)。

所以我這樣做:

Gameofinterest <- Gamedat[ grep("goldeneye", Gamedat[ ,1]), ]` 

那麼我這樣做:

subset(Gamedat, Gamedat[ ,2] %in% Gameofinterest[ ,2] & 
    Gamedat[ ,3] %in% Gameofinterest[ ,3]) 

但是這給了我:

 Games People Hoursplayed 
    goldeneye Michael   5 
    goldeneye Thatcher   8 
    goldeneye Dexter   12 
    goldeneye Dexter   15 
     pacman Dexter   2 
     tetris Clint   5 
     tetris Dexter   8 
    goldeneye Thatcher   12 
     pacman Thatcher   15 
    goldeneye Clint   2 
     pacman Michael   5 
     pacman Michael   8 
     pacman Clint   12 
     tetris Clint   2 
ageofempires Clint   5 
     pacman Dexter   8 
ageofempires Thatcher   12 
    goldeneye Dexter   2 

當我真正想要的是這樣的:

  Games People Hoursplayed 
    goldeneye Michael   5 
    goldeneye Thatcher   8 
    goldeneye Dexter   12 
    goldeneye Dexter   15 
     pacman Dexter   2 
    goldeneye Thatcher   12 
    goldeneye Clint   2 
     pacman Michael   5 
     tetris Clint   2 
    ageofempires Thatcher   12 
    goldeneye Dexter   2 

總之,我要找到匹配的「人& Hoursplayed」那個例子,

,而不是「人」 &「Hoursplayed」 ...有意義嗎?

我知道我能做到這一點:

Gamedat$PHpaste <- paste(Gamedat$People, Gamedat$Hoursplayed, sep="") 

Gamedat[Gamedat[ ,4] %in% Gameofinterest[ ,4], ] 

,並得到:

 Games People Hoursplayed PHpaste 
    goldeneye Michael   5 Michael5 
    goldeneye Thatcher   8 Thatcher8 
    goldeneye Dexter   12 Dexter12 
    goldeneye Dexter   15 Dexter15 
     pacman Dexter   2 Dexter2 
    goldeneye Thatcher   12 Thatcher12 
    goldeneye Clint   2  Clint2 
     pacman Michael   5 Michael5 
     tetris Clint   2  Clint2 
ageofempires Thatcher   12 Thatcher12 
    goldeneye Dexter   2 Dexter2 

希望的東西更優雅?

+0

是您期望的結果是否正確?德克斯特已經打了2個小時的pacman,但是打了29個小時的goldeneye ......是不是因爲這29個小時中有2個是獨特記錄的一部分? – shayaa

+0

最後一行顯示德克斯特已經玩了2個小時,所以這是一個正確的比賽。 – StatGenGeek

回答

0

我認爲這可以使用dplyr來實現。首先,使用過濾器檢索遊戲是否是goldeneye的行。然後使用inner_join使用People和HoursPlayed加入原始數據。可選:選擇所需的列並按人員排列。

library(dplyr) 
Gamedat %>% 
    filter(Games == "goldeneye") %>% 
    inner_join(Gamedat, by = c("People", "Hoursplayed")) %>% 
    select(Games = Games.y, People, Hoursplayed) %>% 
    arrange(People) 

結果:

  Games People Hoursplayed 
1  goldeneye Clint   2 
2  tetris Clint   2 
3  goldeneye Dexter   12 
4  goldeneye Dexter   15 
5  pacman Dexter   2 
6  goldeneye Dexter   2 
7  goldeneye Michael   5 
8  pacman Michael   5 
9  goldeneye Thatcher   8 
10 goldeneye Thatcher   12 
11 ageofempires Thatcher   12 
+0

美麗的謝謝你。 – StatGenGeek