問題與R中

在xtab中的水平對於樣本數據幀：問題與R中

df <- structure(list(area = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
             2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
             4L, 4L, 4L), .Label = c("a1", "a2", "a3", "a4"), class = "factor"), 
        result = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
           1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L), 
        weight = c(0.5, 0.8, 1, 3, 3.4, 1.6, 4, 1.6, 2.3, 2.1, 2, 
           1, 0.1, 6, 2.3, 1.6, 1.4, 1.2, 1.5, 2, 0.6, 0.4, 0.3, 0.6, 
           1.6, 1.8)), .Names = c("area", "result", "weight"), class = "data.frame", row.names = c(NA, 
                                 -26L))

我試圖找出最高和最低的區域面積，然後產生一個加權交叉表，然後將其用於計算風險差。

df.summary <- setDT(df)[,.(.N, freq.1 = sum(result==1), result = weighted.mean((result==1), 
                        w = weight)*100), by = area] 

#Include only regions with highest or lowest percentage 
df.summary <- data.table(df.summary) 
incl <- df.summary[c(which.min(result), which.max(result)),area] 
df.new <- df[df$area %in% incl,] 
incl

「含」有我想要的兩個領域，但仍四個層次：

[1] a2 a3 
Levels: a1 a2 a3 a4

如何擺脫水平的呢？隨後的分析，我想要做的只是兩個層面以及區域。有任何想法嗎？

來源

2016-02-26 KT_1

我在網上找到這在其他地方（例如Problems with levels in a xtab in R）

df.new$area <- factor(df.new$area)

它的工作原理！

希望它對其他人有用。

來源

2016-02-26 10:47:31

但是它是一個data.table，所以'df.new [，area：= factor（area）]'保存'df.new'的變量名稱重複更爲習慣。 –

回答

相關問題