2017-02-12 82 views
0

尋找過濾掉兩個表中相交的動物(相交條件1)並在相同類別中跨表格共享相同大小(相交條件2)。知道一種有效的方法來編碼 - 例如,用dplyr?R中的條件交點

library(dplyr) 
animal1 <- data.frame(type = c("cat", "dog", "dog","bird", "elephant"), 
         size = c("small","large","small", "medium", "large"), tableName = rep("animal1",5), stringsAsFactors = F) 
     #  type size tableName 
     # 1  cat small animal1 
     # 2  dog large animal1 
     # 3  dog small animal1 
     # 4  bird medium animal1 
     # 5 elephant large animal1 

animal2 <- data.frame(type = c("elephant", "dog", "dog", "elephant", "elephant"), 
         size = c("medium","large","large", "small", "large"), 
         tableName = rep("animal2",5), stringsAsFactors = F) 
     #  type size tableName 
     # 1 elephant medium animal2 
     # 2  dog large animal2 
     # 3  dog large animal2 
     # 4 elephant small animal2 
     # 5 elephant large animal2 


rbindAnimal <- rbind(animal1, animal2) 
     #  type size tableName 
     # 1  cat small animal1 
     # 2  dog large animal1 
     # 3  dog small animal1 
     # 4  bird medium animal1 
     # 5 elephant large animal1 
     # 6 elephant medium animal2 
     # 7  dog large animal2 
     # 8  dog large animal2 
     # 9 elephant small animal2 
     # 10 elephant large animal2 

# Intersection across both tables 
intersectType <- intersect(rbindAnimal %>% filter(tableName == "animal1") %>% select(type), 
              rbindAnimal %>% filter(tableName == "animal2") %>% select(type)) 
     #  type 
     # 1 elephant 
     # 2  dog 

rbindAnimal <- rbindAnimal[which(rbindAnimal$type %in% intersectType$type),] 

     #  type size tableName 
     # 2  dog large animal1 
     # 3  dog small animal1 
     # 5 elephant large animal1 
     # 6 elephant medium animal2 
     # 7  dog large animal2 
     # 8  dog large animal2 
     # 9 elephant small animal2 
     # 10 elephant large animal2 

# Needs to return row numbers! Here: 2,5,7,8, and 10 
#  type size tableName 
# 2  dog large animal1 
# 5 elephant large animal1 
# 7  dog large animal2 
# 8  dog large animal2 
# 10 elephant large animal2 
+1

所需的輸出不明確。你是否試圖在類型和大小上進行合併,或試圖只保留兩個data.frames中不存在的類型大小的觀察值? – lmo

+0

好點!我認爲按照類型和大小進行合併是我的目標。最後一行顯示所需的輸出 - 行索引值可用於反向過濾。 – eyeOfTheStorm

回答

1

「需要返回行號!」

這是非常簡單的使用.I從data.table,存儲行號:

library(data.table) 
setDT(rbindAnimal) 

w <- rbindAnimal[, if (uniqueN(tableName) > 1L) .I, by=.(type, size)]$V1 
# [1] 2 7 8 5 10 
rbindAnimal[-w] 
#  type size tableName 
# 1:  cat small animal1 
# 2:  dog small animal1 
# 3:  bird medium animal1 
# 4: elephant medium animal2 
# 5: elephant small animal2 

相反的反連接(如OP的答案),我們只是用數字不包括行。

它是如何工作

  • uniqueN計數唯一值的數量。 OP的條件是(釋義):

    這兩種表格中都顯示了字體大小組合。

    其轉換爲

    uniqueN(tableName) > 1Lby=.(type, size)組行英寸

  • if (cond) x給出x如果條件成立;否則,刪除組。


dplyr變種

它在dplyr正常工作,以及(雖然我不知道怎麼去行號):

rbindAnimal %>% group_by(type, size) %>% filter(n_distinct(tableName) == 1L) 
#  type size tableName 
#  <chr> <chr>  <chr> 
# 1  cat small animal1 
# 2  dog small animal1 
# 3  bird medium animal1 
# 4 elephant medium animal2 
# 5 elephant small animal2 
+1

不錯的工作弗蘭克! – eyeOfTheStorm

0

解決方案:(!感謝合併尖@Imo)使用合併/ semi_join/anti_join

library(dplyr) 
animal1 <- data.frame(type = c("cat", "dog", "dog","bird", "elephant"), 
         size = c("small","large","small", "medium", "large"), tableName = rep("animal1",5), stringsAsFactors = F) 
     #  type size tableName 
     # 1  cat small animal1 
     # 2  dog large animal1 
     # 3  dog small animal1 
     # 4  bird medium animal1 
     # 5 elephant large animal1 

animal2 <- data.frame(type = c("elephant", "dog", "dog", "elephant", "elephant"), 
         size = c("medium","large","large", "small", "large"), 
         tableName = rep("animal2",5), stringsAsFactors = F) 
     #  type size tableName 
     # 1 elephant medium animal2 
     # 2  dog large animal2 
     # 3  dog large animal2 
     # 4 elephant small animal2 
     # 5 elephant large animal2 

rbindAnimal <- rbind(animal1, animal2) 
mergedAnimals <- merge(animal1, animal2, by = c("type","size"), all = T) 
sharedTypeSize <- mergedAnimals[complete.cases(mergedAnimals),] %>% select(type,size) %>% unique 
sharedTypeSize <- merge(rbindAnimal, sharedTypeSize) 

semi_join(rbindAnimal, sharedTypeSize) 
     #  type size tableName 
     # 1  dog large animal1 
     # 2  dog large animal2 
     # 3  dog large animal2 
     # 4 elephant large animal1 
     # 5 elephant large animal2 

anti_join(rbindAnimal, sharedTypeSize) 

     #  type size tableName 
     # 1  cat small animal1 
     # 2  dog small animal1 
     # 3  bird medium animal1 
     # 4 elephant medium animal2 
     # 5 elephant small animal2