自動計算彙總統計的數據幀和創建新表

我有以下數據框：自動計算彙總統計的數據幀和創建新表

col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov", 
      "chi","avi","chi","chi","bov","bov","fox","avi","bov","chi") 
col2 <- c("low","med","high","high","low","low","med","med","med","high", 
      "low","low","high","high","med","med","low","low","med") 
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0) 

test_data <- cbind(col1, col2, col3) 
test_data <- as.data.frame(test_data)

，我想是這樣的表落得（值是隨機的）：

Species Pop.density %Resistance CI_low CI_high Total samples 
avi  low   2.0   1.2  2.2  30 
avi  med   0   0  0.5  20 
avi  high   3.5   2.9  4.2  10 
chi  low   0.5   0.3  0.7  20 
chi  med   2.0   1.9  2.1  150 
chi  high   6.5   6.2  6.6  175

％阻力欄基於上面的col3，其中1 =耐，0 =不耐。我曾嘗試以下：

library(dplyr) 
test_data<-test_data %>% 
    count(col1,col2,col3) %>% 
    group_by(col1, col2) %>% 
    mutate(perc_res = prop.table(n)*100)

我想這一點，它似乎幾乎做的伎倆，因爲我得到的總的1和0的COL3的百分比，在col1和2每一個值，但總樣本是錯誤的，因爲我指望所有的三列，當正確的計數將是唯一的col1和2

對於置信區間我會用以下內容：

binom.test(resistant samples,total samples)$conf.int*100

但是我不知道怎麼樣與其他人一起實施。有沒有簡單快捷的方法來做到這一點？

來源

2017-09-15 Haakonkas

我建議使用group_by然後使用匯總功能。 – Jul

使用'data.frame（col1，col2，col3）'，而不是'cbind'，這會迫使每列在這裏串起來。 – Frank

您的示例數據沒有（「avi」，「high」）對。您是否希望該行反正出現（使用NAs和零採樣數）？ – Frank

這應該這樣做。

library(tidyverse) 
library(broom) 

test_data %>% 
    mutate(col3 = ifelse(col3 == 0, "NonResistant", "Resistant")) %>% 
    count(col1, col2, col3) %>% 
    spread(col3, n, fill = 0) %>% 
    mutate(PercentResistant = Resistant/(NonResistant + Resistant)) %>% 
    mutate(test = map2(Resistant, NonResistant, ~ binom.test(.x, .x + .y) %>% tidy())) %>% 
    unnest() %>% 
    transmute(Species = col1, Pop.density = col2, PercentResistant, CI_low = conf.low * 100, CI_high = conf.high * 100, TotalSamples = Resistant + NonResistant)

變異的0/1阻力列，以便它有可讀價值。
計算每個存儲桶中的值。
將col3/n分爲兩列，即Resistant/NonResistant，並將計數（n）放入這些列中。現在每行都有它做測試所需的一切。
計算百分比阻力
對每個存儲桶執行測試並將結果放入一個名爲test的嵌套幀中。
Unnest測試數據框，以便您可以使用測試結果。
清理，給一切美好的名字。

結果

來源

2017-09-15 15:02:55

偉大的解決方案！我可以問'conf.low'來自'unnest（）'後面我只看到'估計值'和'統計量'嗎？ – PoGibas

@PoGibas：conf.low來自'tidy（）'，然後'unid'。如果你看到估計它應該在那裏。瘋狂的猜測，你的窗口不是那麼寬，在結果下面有「... X多變量」的東西？ –

aaa，它是'tbl_df'，'％>％data.frame（）'顯示它。不能習慣'tibble' – PoGibas

我會做...

library(data.table) 
setDT(DT) 

DT[, { 
    bt <- binom.test(sum(resists), .N)$conf.int*100 
    .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N) 
}, keyby=.(species, popdens)] 

    species popdens res_rate res_lo res_hi n 
1:  avi  low 0.00000 0.000000 70.75982 3 
2:  avi  med 0.00000 0.000000 97.50000 1 
3:  bov  low 100.00000 15.811388 100.00000 2 
4:  bov  med 50.00000 1.257912 98.74209 2 
5:  bov high 100.00000 15.811388 100.00000 2 
6:  chi  low 0.00000 0.000000 97.50000 1 
7:  chi  med 50.00000 1.257912 98.74209 2 
8:  chi high 66.66667 9.429932 99.15962 3 
9:  fox  low 0.00000 0.000000 97.50000 1 
10:  fox  med 50.00000 1.257912 98.74209 2

要包括所有級別（物種和種羣密度的組合）...

DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), { 
    bt <- 
    if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100 
    else NA_real_ 
    .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)  
}, by=.EACHI] 

    species popdens res_rate res_lo res_hi n 
1:  avi  low 0.00000 0.000000 70.75982 3 
2:  avi  med 0.00000 0.000000 97.50000 1 
3:  avi high  NA  NA  NA 0 
4:  bov  low 100.00000 15.811388 100.00000 2 
5:  bov  med 50.00000 1.257912 98.74209 2 
6:  bov high 100.00000 15.811388 100.00000 2 
7:  chi  low 0.00000 0.000000 97.50000 1 
8:  chi  med 50.00000 1.257912 98.74209 2 
9:  chi high 66.66667 9.429932 99.15962 3 
10:  fox  low 0.00000 0.000000 97.50000 1 
11:  fox  med 50.00000 1.257912 98.74209 2 
12:  fox high  NA  NA  NA 0

它如何w獸人

語法DT[i, j, by=]其中...

i確定行的子集，有時用輔助參數，on=或roll=。
by=確定子集表內的組，如果排序則切換到keyby=。
j是代碼在每個組上的代碼。

j應該評估的名單，.()是一個快捷方式list()。有關詳細信息，請參閱?data.table。

數據使用

（改名爲列，重新格式化二元變量回0/1或假/真，集人口密度按照正確的順序水平）：

DT = data.frame(
    species = col1, 
    popdens = factor(col2, levels=c("low", "med", "high")), 
    resists = col3 
)

來源

2017-09-15 15:33:56 Frank

自動計算彙總統計的數據幀和創建新表

回答

相關問題