拆分數據幀分成兩組

我模擬這個data.frame：拆分數據幀分成兩組

library(plyr); library(ggplot2) 
count <- rev(seq(0, 500, 20)) 
tide <- seq(0, 5, length.out = length(count)) 
df <- data.frame(count, tide) 

count_sim <- unlist(llply(count, function(x) rnorm(20, x, 50))) 
count_sim_df <- data.frame(tide=rep(tide,each=20), count_sim)

它可以繪製這樣的：

ggplot(df, aes(tide, count)) + geom_jitter(data = count_sim_df, aes(tide, count_sim), position = position_jitter(width = 0.09)) + geom_line(color = "red")

enter image description here

我現在想count_sim_df分成兩組：high和low。當我繪製分割count_sim_df時，它應該看起來像這樣（綠色和藍色的所有內容都是photoshopped）。我發現棘手的位在high和low之間的重疊在tide的中間值附近。

這是我想count_sim_df分爲高，低：

分配的count_sim_df一半high和count_sim_df一半low
重新分配的count值來high和low之間創建重疊大約在中間值tide

enter image description here

來源

2015-07-11 luciano

這裏的生成樣本數據集，並使用相對較少的代碼，只是基礎R的組的一種方法：

library(ggplot2) 
count <- rev(seq(0, 500, 20)) 
tide <- seq(0, 5, length.out = length(count)) 
df <- data.frame(count, tide) 

count_sim_df <- data.frame(tide = rep(tide,each=20), 
          count = rnorm(20 * nrow(df), rep(count, each = 20), 50)) 
margin <- 0.3 
count_sim_df$`tide level` <- 
    with(count_sim_df, 
    factor((tide >= quantile(tide, 0.5 + margin/2) | 
      (tide >= quantile(tide, 0.5 - margin/2) & sample(0:1, length(tide), TRUE))), 
      labels = c("Low", "High"))) 
ggplot(df, aes(x = tide, y = count)) + 
    geom_line(colour = "red") + 
    geom_point(aes(colour = `tide level`), count_sim_df, position = "jitter") + 
    scale_colour_manual(values = c(High = "green", Low = "blue"))

來源

2015-07-11 13:58:13

這是我修改後的建議。我希望它有幫助。

middle_tide <- mean(count_sim_df$tide) 
hilo_margin <- 0.3 
middle_df <- subset(count_sim_df,tide > (middle_tide * (1 - hilo_margin))) 
middle_df <- subset(middle_df, tide < (middle_tide * (1 + hilo_margin))) 
upper_df <- count_sim_df[count_sim_df$tide > (middle_tide * (1 + hilo_margin)),] 
lower_df <- count_sim_df[count_sim_df$tide < (middle_tide * (1 - hilo_margin)),] 
idx <- sample(2,nrow(middle_df), replace = T) 
count_sim_high <- rbind(middle_df[idx==1,], upper_df) 
count_sim_low <- rbind(middle_df[idx==2,], lower_df) 
p <- ggplot(df, aes(tide, count)) + 
    geom_jitter(data = count_sim_high, aes(tide, count_sim), position = position_jitter(width = 0.09), alpha=0.4, col=3, size=3) + 
    geom_jitter(data = count_sim_low, aes(tide, count_sim), position = position_jitter(width = 0.09), alpha=0.4, col=4, size=3) + 
    geom_line(color = "red")

enter image description here

我仍然可能沒有完全理解你的程序，分爲高，低，您可以通過「重新分配數量的值」的意思是什麼特別。在這種情況下，我已經在中間值tide周圍定義了30％的重疊區域，並將該過渡區域內的一半點隨機分配給「高」，另一半分配給「低」組。

來源

2015-07-11 08:27:33 RHertel

問題編輯做出更加明確瞭如何創建重疊 – luciano

拆分數據幀分成兩組

回答

相關問題