嘗試使用dplyr到GROUP_BY和應用規模（）

試圖在以下的數據幀使用dplyr到group_by的stud_ID變量，如在this SO question：嘗試使用dplyr到GROUP_BY和應用規模（）

> str(df) 
'data.frame': 4136 obs. of 4 variables: 
$ stud_ID   : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ... 
$ behavioral_scale: num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ... 
$ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ... 
$ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...

我嘗試以下通過學生獲得量表評分（而不是規模得分觀察所有學生）：

scaled_data <- 
      df %>% 
       group_by(stud_ID) %>% 
        mutate(behavioral_scale_ind = scale(behavioral_scale), 
         cognitive_scale_ind = scale(cognitive_scale), 
         affective_scale_ind = scale(affective_scale))

下面是結果：

> str(scaled_data) 
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 4136 obs. of 7 variables: 
$ stud_ID    : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ... 
$ behavioral_scale : num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ... 
$ cognitive_scale  : num 3.5 3 3 3 3.5 2 NA NA 1 1 ... 
$ affective_scale  : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ... 
$ behavioral_scale_ind: num [1:12, 1] 0.64 1.174 0.64 0.107 0.64 ... 
    ..- attr(*, "scaled:center")= num 2.9 
    ..- attr(*, "scaled:scale")= num 0.937 
$ cognitive_scale_ind : num [1:12, 1] 1.17 0.64 0.64 0.64 1.17 ... 
    ..- attr(*, "scaled:center")= num 2.4 
    ..- attr(*, "scaled:scale")= num 0.937 
$ affective_scale_ind : num [1:12, 1] 0 1.28 0.64 0.64 0 ... 
    ..- attr(*, "scaled:center")= num 2.5 
    ..- attr(*, "scaled:scale")= num 0.782

三個縮放變量（behavioral_scale,cognitive_scale和affective_scale）只有12個觀測值 - 第一個學生的觀測值數量相同，爲ABB112292。

這是怎麼回事？我如何獲得個人的縮放分數？

來源

2016-03-03 Joshua Rosenberg

有你看着'總結（）''中dplyr'？ – count

我想你應該在進行變異之前進行變異，否則你要將他/她自己的每個學生的得分集中在他/她自己的 – C8H10N4O2

@ C8H10N4O2上，這樣每個學生的觀察結果都會有M = 0和SD = 1 –

該問題似乎在基礎scale()功能，它需要一個矩陣。嘗試寫你自己的。

scale_this <- function(x){ 
    (x - mean(x, na.rm=TRUE))/sd(x, na.rm=TRUE) 
}

然後這個工程：

library("dplyr") 

# reproducible sample data 
set.seed(123) 
n = 1000 
df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE), 
       behavioral_scale = runif(n, 0, 10), 
       cognitive_scale = runif(n, 1, 20), 
       affective_scale = runif(n, 0, 1)) 
scaled_data <- 
    df %>% 
    group_by(stud_ID) %>% 
    mutate(behavioral_scale_ind = scale_this(behavioral_scale), 
     cognitive_scale_ind = scale_this(cognitive_scale), 
     affective_scale_ind = scale_this(affective_scale))

或者，如果你打開一個data.table解決方案：

library("data.table") 

setDT(df) 

cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale") 

df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]

來源

2016-03-03 15:30:19 C8H10N4O2

這是一個dplyr known problem，修復已經被合併到開發版，您可以通過安裝

# install.packages("devtools") 
devtools::install_github("hadley/dplyr")

在穩定的版本，下面應該工作，太：

scale_this <- function(x) as.vector(scale(x))

來源

2016-09-24 02:13:27 krlmlr

嘗試使用dplyr到GROUP_BY和應用規模（）

回答

相關問題