2016-02-26 93 views
1

在xtab中的水平對於樣本數據幀:問題與R中

df <- structure(list(area = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
             2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 
             4L, 4L, 4L), .Label = c("a1", "a2", "a3", "a4"), class = "factor"), 
        result = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
           1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L), 
        weight = c(0.5, 0.8, 1, 3, 3.4, 1.6, 4, 1.6, 2.3, 2.1, 2, 
           1, 0.1, 6, 2.3, 1.6, 1.4, 1.2, 1.5, 2, 0.6, 0.4, 0.3, 0.6, 
           1.6, 1.8)), .Names = c("area", "result", "weight"), class = "data.frame", row.names = c(NA, 
                                 -26L)) 

我試圖找出最高和最低的區域面積,然後產生一個加權交叉表,然後將其用於計算風險差。

df.summary <- setDT(df)[,.(.N, freq.1 = sum(result==1), result = weighted.mean((result==1), 
                        w = weight)*100), by = area] 

#Include only regions with highest or lowest percentage 
df.summary <- data.table(df.summary) 
incl <- df.summary[c(which.min(result), which.max(result)),area] 
df.new <- df[df$area %in% incl,] 
incl 

「含」有我想要的兩個領域,但仍四個層次:

[1] a2 a3 
Levels: a1 a2 a3 a4 

如何擺脫水平的呢?隨後的分析,我想要做的只是兩個層面以及區域。有任何想法嗎?

回答

2

我在網上找到這在其他地方(例如Problems with levels in a xtab in R

df.new$area <- factor(df.new$area) 

它的工作原理!

希望它對其他人有用。

+0

但是它是一個data.table,所以'df.new [,area:= factor(area)]'保存'df.new'的變量名稱重複更爲習慣。 –