2014-06-28 38 views
-1

我正在嘗試生成某個國家給定城市內天主教徒百分比的估計值,並使用多級迴歸和調查數據後分層。Tapply僅產生缺失值

該方法適合多級Logit並生成因變量的預測概率。然後使用樣本的事後分級對人口普查數據加權概率。

我可以生成初始估計值(基本上就是調查數據中給定個體的天主教徒的預測概率)。但是,當我嘗試使用下面最後一行代碼取平均值時,它只返回不適用於每個城市。最初的細胞預測有一些缺失的值,但遠不及大多數。

我不明白爲什麼我不能生成市政加權平均數,因爲我遵循使用不同數據的程序。任何幫助將不勝感激。

rm(list=ls(all=TRUE)) 

library("arm") 
library("foreign") 

#read in megapoll and attach 
ES.data <- read.dta("ES4.dta", convert.underscore = TRUE) 

#read in municipal-level dataset 

munilevel <- read.dta("election.dta",convert.underscore = TRUE) 
munilevel <- munilevel[order(munilevel$municode),] 

#read in Census data 
Census <- read.dta("poststratification4.dta",convert.underscore = TRUE) 
Census <- Census[order(Census$municode),] 
Census$municode <- match(Census$municode, munilevel$municode) 

#Create index variables 

#At level of megapoll 

ES.data$ur.female <- (ES.data$female *2) + ES.data$ur 
ES.data$age.edr <- 6 * (ES.data$age -1) + ES.data$edr 

#At census level (same coding as above for all variables) 
Census$cur.cfemale <- (Census$cfemale *2) + Census$cur 
Census$cage.cedr <- 6 * (Census$cage -1) + Census$cedr 

##Municipal level variables 
Census$c.arena<- munilevel$c.arena[Census$municode] 
Census$c.fmln <- munilevel$c.fmln[Census$municode] 



#run individual-level opinion model 

individual.model1 <- glmer(formula = catholic ~ (1|ur.female) + (1|age) 
+ (1|edr) + (1|age.edr) + (1|municode) + p.arena +p.fmln 
,data=ES.data, family=binomial(link="logit")) 
display(individual.model1) 



#examine random effects and standard errors for urban-female 
ranef(individual.model1)$ur.female 
se.ranef(individual.model1)$ur.female 

#create vector of state ranefs and then fill in missing ones 
muni.ranefs <- array(NA,c(66,1)) 
dimnames(muni.ranefs) <- list(c(munilevel$municode),"effect") 
for(i in munilevel$municode){ 
muni.ranefs[i,1] <- ranef(individual.model1)$municode[i,1] 
} 
muni.ranefs[,1][is.na(muni.ranefs[,1])] <- 0 #set states with missing REs (b/c not in  data) to zero 


#create a prediction for each cell in Census data 
cellpred1 <- invlogit(fixef(individual.model1)["(Intercept)"] 
    +ranef(individual.model1)$ur.female[Census$cur.cfemale,1] 
    +ranef(individual.model1)$age[Census$cage,1] 
    +ranef(individual.model1)$edr[Census$cedr,1] 
    +ranef(individual.model1)$age.edr[Census$cage.cedr,1] 
    +muni.ranefs[Census$municode,1] 
    +(fixef(individual.model1)["p.fmln"] *Census$c.fmln) # municipal level 
    +(fixef(individual.model1)["p.arena"] *Census$c.arena)) # municipal level 



#weights the prediction by the freq of cell          
cellpredweighted1 <- cellpred1 * Census$cpercent.muni 

#calculates the percent within each municipality (weighted average of responses) 
munipred <- 100* as.vector(tapply(cellpredweighted1, Census$municode, sum)) 
munipred 

回答

1

大量的代碼是完全沒有數據的冗餘!我想你在對象cellpredweighted1中有NA s,默認情況下sum()NAs傳播給答案,因爲如果一個向量的一個或多個元素是NA那麼根據定義,那些元素的總和也是NA

如果上面是這裏的情況,那麼簡單地將na.rm = TRUE添加到tapply()調用應該可以解決問題。

tapply(cellpredweighted1, Census$municode, sum, na.rm = TRUE) 

你應該問自己,爲什麼在這個階段,如果從早期的過程中的錯誤,這些結果是NA秒。