一個數據幀R的條件子集

讓數據幀是：一個數據幀R的條件子集

set.seed(123) 
df<-data.frame(name=sample(LETTERS,260,replace=TRUE), 
       hobby=rep(c("outdoor","indoor"),260),chess=rnorm(1:10))

和我將使用從DF提取條件是：

df_cond<-df %>% group_by(name,hobby) %>% 
    summarize(count=n()) %>% 
    mutate(sum.var=sum(count),sum.name=length(name)) %>% 
    filter(sum.name==2) %>% 
    mutate(min.var=min(count)) %>% 
    mutate(use=ifelse(min.var==count,"yes","no")) %>% 
    filter(grepl("yes",use))

我想隨機提取的行從df與df_cond（df的其餘部分）中的（名稱，愛好，計數）組合相對應。我在結合%in%和sample時遇到了一些問題。感謝您的任何線索！

編輯：例如：

head(df_cond) 
     name hobby count sum.var sum.name min.var use 
     <fctr> <fctr> <int> <int> <int> <int> <chr> 
    1  A indoor  2  6  2  2 yes 
    2  B indoor  8  16  2  8 yes 
    3  B outdoor  8  16  2  8 yes 
    4  C outdoor  6  14  2  6 yes 
    5  D indoor 10  24  2  10 yes 
    6  E outdoor  8  18  2  8 yes

使用上述數據幀，我想隨機df提取2行（=計數）與組合A +室內（ROW1） 8行與組合B +室內（第2排）從df ....等等。

結合@denrous和@Jacob的答案來獲得我所需要的。像這樣：如果

m2<-df_cond %>% 
    mutate(data = map2(name, hobby, function(x, y) {df %>% filter(name == x, hobby == y)})) %>% 
    ungroup() %>% 
    select(data) %>% 
    unnest() 



test<-m2 %>% 
group_by(name,hobby) %>% 
summarize(num.levels=length(unique(hobby))) %>% 
ungroup() %>% 
group_by(name) %>% 
summarize(total_levels=sum(num.levels)) %>% 
filter(total_levels>1) 

fin<-semi_join(m2,test)

來源

2016-11-29 thisisrg

如果我理解正確的話，你可以使用purrr達到你想要的東西：

df_cond %>% 
    mutate(data = map2(name, hobby, function(x, y) {filter(df, name == x, hobby == y)})) %>% 
    mutate(data = map2(data, count, function(x, y) sample_n(x, size = y)))

如果你想與df相同的格式：

df_cond %>% 
    mutate(data = map2(name, hobby, function(x, y) {df %>% filter(name == x, hobby == y)})) %>% 
    mutate(data = map2(data, count, function(x, y) sample_n(x, size = y))) %>% 
    ungroup() %>% 
    select(data) %>% 
    unnest()

來源

2016-11-29 20:05:20 denrou

太棒了！需要 – thisisrg

這不會讓我滿足我需要的，但足夠接近。謝謝！稍後會發布最終解決方案。 – thisisrg

尚不清楚這正是你想要的，但你可能會尋找left_join：基於OP澄清

df %>% 
    left_join(df_cond, by = "name")

來源

2016-11-29 19:10:04 Anand

我並不想要加入。我想要從df中隨機抽樣（在'df_cond'中由'count'和組合名稱+ hobby定義的行數]。我將添加一個示例來澄清問題。 – thisisrg

編輯。

必須有更好的辦法，但我會使用一個循環：

library(dplyr) 

master_df <- data.frame() 

for (i in 1:nrow(df_cond)){ 
    name = as.character(df_cond[i, 1]) 
    hobby = as.character(df_cond[i, 2]) 
    n = as.numeric(df_cond[i, 3]) 

    temp_df <- df %>% filter(name == name, hobby == hobby) 
    temp_df <- sample_n(temp_df, n) 
    master_df <- rbind(master_df, temp_df) 
     }

來源

2016-11-29 19:12:56 Jacob

謝謝......但這不完全是我我試圖澄清這個問題 – thisisrg

我可以看到如何工作的原理，但它沒有給出正確的輸出結果我已經結合你的和@denrous的答案來得到我需要的東西 – thisisrg

一個數據幀R的條件子集

回答

相關問題