2014-10-27 54 views
0

我正在開發一個項目,以模擬丟失的數據並對採樣數據運行迴歸。 這是我到目前爲止。 庫(MASS)將不平衡的行組合到一個數據框中

#specifying the covariance matrix 
sigma <- matrix(c(1,.7,.49,.343,.2401,.7,1,.7,.49,.343,.49,.7,1,.7,.49,.343,.49, 
       .7,1,.7,.2401,.343,.49,.7,1),5,5,byrow=TRUE) 

#generating the data 
data <- mvrnorm(n=1000, c(5,5.25,5.5,5.75,6), sigma) 
split(data,) 

#specifying the missing data mechanism for MCAR 
LogoddsratioMCAR <- -.5 
OddsRatioMCAR <-exp(LogoddsratioMCAR) 
OddsMCAR <- OddsRatioMCAR/(1+OddsRatioMCAR) 
Probability2 <- 1-OddsMCAR 
Probability3 <- Probability2 - OddsMCAR*(Probability2) 
Probability4 <- Probability3 - OddsMCAR*(Probability3) 
Probability5 <- Probability4 - OddsMCAR*(Probability4) 

#sampling from each column 
dataframe <- as.data.frame(data) 
dataMCAR1 <- dataframe$V1 
dataMCAR2 <- dataframe$V2[sample(1:nrow(data),Probability2*nrow(data))] 
dataMCAR3 <- dataframe$V3[sample(1:nrow(data),Probability3*nrow(data))] 
dataMCAR4 <- dataframe$V4[sample(1:nrow(data),Probability4*nrow(data))] 
dataMCAR5 <- dataframe$V5[sample(1:nrow(data),Probability5*nrow(data))] 

現在我需要NA的增加dataMCAR2-dataMCAR5爲了使列表是相同的長度。我想將它們合併成一個數據框並對它們進行迴歸。

如何將這些NA添加到列表中?

+0

確實[這](https://github.com/raredd/rawr/blob/master/R/utils.R# L1600:L1606)爲你工作? – rawr 2014-10-27 00:52:35

回答

0

這是一種方法。 dataMCAR1的長度爲1000.因此,您希望其他矢量具有相同的長度(例如,dataMCAR2)。在這裏,我連接了lapply中的每個向量和NAs。然後,我使用cbind綁定了所有五個向量,並創建了一個數據幀。最後,我改變了使用的名字列名從列表(即,ANA)

ana <- mget(ls(pattern = "^dataMCAR\\d+")) 

bob <- as.data.frame(Reduce(cbind, 
         lapply(ana, function(x) c(x, rep(c(NA), times = (1000 - length(x))))) 
          ) 
        ) 

colnames(bob) <- names(ana) 

# dataMCAR1 dataMCAR2 dataMCAR3 dataMCAR4 dataMCAR5 
#1 3.492947 6.702115 4.743988 6.330211 6.257005 
#2 4.637356 5.322731 4.916232 6.209659 7.619699 
#3 2.967167 4.397137 5.445473 6.632309 6.844667 
#4 4.484144 4.814281 5.060921 5.357306 4.831958 
#5 6.245234 5.471267 4.959116 5.975332 6.243439 
#6 5.334700 4.122378 6.671627 6.529121 7.354149 

#summary(bob) 
# dataMCAR1  dataMCAR2  dataMCAR3  dataMCAR4  dataMCAR5  
#Min. :1.465 Min. :2.141 Min. :2.223 Min. :3.253 Min. :3.249 
#1st Qu.:4.334 1st Qu.:4.606 1st Qu.:4.886 1st Qu.:5.106 1st Qu.:5.412 
#Median :5.005 Median :5.336 Median :5.616 Median :5.795 Median :6.064 
#Mean :5.000 Mean :5.305 Mean :5.550 Mean :5.783 Mean :6.041 
#3rd Qu.:5.657 3rd Qu.:5.957 3rd Qu.:6.225 3rd Qu.:6.487 3rd Qu.:6.697 
#Max. :8.168 Max. :8.955 Max. :8.208 Max. :8.043 Max. :8.740 
#    NA's :378  NA's :613  NA's :759  NA's :850 
相關問題