2015-06-14 84 views
1

我希望有人能幫助我。我正在嘗試進行一項分析,以檢查在高度梯度上捕獲的膜翅目昆蟲的樣本數量。我想研究與高程有關的單峯分佈的可能性以及線性分佈。因此,我在分析中包含I(Altitude^2)作爲解釋變量。用poisson錯誤結構擬合glmer時出錯

我想運行下面的模型,其中包括一個泊松錯誤結構(因爲我們正在處理計數數據)和日期和陷阱類型(Trap)作爲隨機效應。

model7 <- glmer(No.Specimens~Altitude+I(Altitude^2)+(1|Date)+(1|Trap), 
     family="poisson",data=Santa.Lucia,na.action=na.omit) 

不過,我不斷收到以下錯誤信息:

Error: (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate 
In addition: Warning messages: 
1: Some predictor variables are on very different scales: consider rescaling 
2: In pwrssUpdate(pp, resp, tolPwrss, GQmat, compDev, fac, verbose) : 
    Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, line 431 
3: In pwrssUpdate(pp, resp, tolPwrss, GQmat, compDev, fac, verbose) : 
    Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, line 431 

顯然我在做一些大的失誤。任何人都可以幫我弄清楚我哪裏出錯了嗎?

下面是數據幀的結構:

str(Santa.Lucia) 
'data.frame': 97 obs. of 6 variables: 
$ Date  : Factor w/ 8 levels "01-Sep-2014",..: 6 6 6 6 6 6 6 6 6 6 ... 
$ Trap.No  : Factor w/ 85 levels "N1","N10","N11",..: 23 48 51 14 17 20 24 27 30 33 ... 
$ Altitude : int 1558 1635 1703 1771 1840 1929 1990 2047 2112 2193 ... 
$ Trail  : Factor w/ 3 levels "Cascadas","Limones",..: 1 1 1 1 1 3 3 3 3 3 ... 
$ No.Specimens: int 1 0 2 2 3 4 5 0 1 1 ... 
$ Trap  : Factor w/ 2 levels "Net","Pan": 2 2 2 2 2 2 2 2 2 2 ... 

這裏是完全在data.set(這些只是我的初步分析)

  Date Trap.No Altitude Trail No.Specimens Trap 
1 28-Aug-2014  W2  1558 Cascadas   1 Pan 
2 28-Aug-2014  W5  1635 Cascadas   0 Pan 
3 28-Aug-2014  W8  1703 Cascadas   2 Pan 
4 28-Aug-2014  W11  1771 Cascadas   2 Pan 
5 28-Aug-2014  W14  1840 Cascadas   3 Pan 
6 28-Aug-2014  W17  1929 Tower   4 Pan 
7 28-Aug-2014  W20  1990 Tower   5 Pan 
8 28-Aug-2014  W23  2047 Tower   0 Pan 
9 28-Aug-2014  W26  2112 Tower   1 Pan 
10 28-Aug-2014  W29  2193 Tower   1 Pan 
11 28-Aug-2014  W32  2255 Tower   0 Pan 
12 30-Aug-2014  N1  1562 Cascadas   5 Net 
13 30-Aug-2014  N2  1635 Cascadas   0 Net 
14 30-Aug-2014  N3  1723 Cascadas   2 Net 
15 30-Aug-2014  N4  1779 Cascadas   0 Net 
16 30-Aug-2014  N5  1842 Cascadas   3 Net 
17 30-Aug-2014  N6  1924 Tower   2 Net 
18 30-Aug-2014  N7  1979 Tower   2 Net 
19 30-Aug-2014  N8  2046 Tower   0 Net 
20 30-Aug-2014  N9  2110 Tower   0 Net 
21 30-Aug-2014  N10  2185 Tower   0 Net 
22 30-Aug-2014  N11  2241 Tower   0 Net 
23 31-Aug-2014  N1  1562 Cascadas   1 Net 
24 31-Aug-2014  N2  1635 Cascadas   1 Net 
25 31-Aug-2014  N3  1723 Cascadas   0 Net 
26 31-Aug-2014  N4  1779 Cascadas   0 Net 
27 31-Aug-2014  N5  1842 Cascadas   0 Net 
28 31-Aug-2014  N6  1924 Tower   0 Net 
29 31-Aug-2014  N7  1979 Tower   7 Net 
30 31-Aug-2014  N8  2046 Tower   4 Net 
31 31-Aug-2014  N9  2110 Tower   6 Net 
32 31-Aug-2014  N10  2185 Tower   1 Net 
33 31-Aug-2014  N11  2241 Tower   1 Net 
34 01-Sep-2014  W1  1539 Cascadas   0 Pan 
35 01-Sep-2014  W2  1558 Cascadas   0 Pan 
36 01-Sep-2014  W3  1585 Cascadas   2 Pan 
37 01-Sep-2014  W4  1604 Cascadas   0 Pan 
38 01-Sep-2014  W5  1623 Cascadas   1 Pan 
39 01-Sep-2014  W6  1666 Cascadas   4 Pan 
40 01-Sep-2014  W7  1699 Cascadas   0 Pan 
41 01-Sep-2014  W8  1703 Cascadas   0 Pan 
42 01-Sep-2014  W9  1746 Cascadas   1 Pan 
43 01-Sep-2014  W10  1762 Cascadas   0 Pan 
44 01-Sep-2014  W11  1771 Cascadas   0 Pan 
45 01-Sep-2014  W12  1796 Cascadas   1 Pan 
46 01-Sep-2014  W13  1825 Cascadas   0 Pan 
47 01-Sep-2014  W14  1840 Tower   4 Pan 
48 01-Sep-2014  W15  1859 Tower   2 Pan 
49 01-Sep-2014  W16  1889 Tower   2 Pan 
50 01-Sep-2014  W17  1929 Tower   0 Pan 
51 01-Sep-2014  W18  1956 Tower   0 Pan 
52 01-Sep-2014  W19  1990 Tower   1 Pan 
53 01-Sep-2014  W20  2002 Tower   3 Pan 
54 01-Sep-2014  W21  2023 Tower   2 Pan 
55 01-Sep-2014  W22  2047 Tower   0 Pan 
56 01-Sep-2014  W23  2068 Tower   1 Pan 
57 01-Sep-2014  W24  2084 Tower   0 Pan 
58 01-Sep-2014  W25  2112 Tower   1 Pan 
59 01-Sep-2014  W26  2136 Tower   0 Pan 
60 01-Sep-2014  W27  2150 Tower   1 Pan 
61 01-Sep-2014  W28  2193 Tower   1 Pan 
62 01-Sep-2014  W29  2219 Tower   0 Pan 
63 01-Sep-2014  W30  2227 Tower   1 Pan 
64 01-Sep-2014  W31  2255 Tower   0 Pan 
85 03/06/2015 WT47  1901 Tower   2 Pan 
86 03/06/2015 WT48  1938 Tower   2 Pan 
87 03/06/2015 WT49  1963 Tower   2 Pan 
88 03/06/2015 WT50  1986 Tower   0 Pan 
89 03/06/2015 WT51  2012 Tower   9 Pan 
90 03/06/2015 WT52  2033 Tower   0 Pan 
91 03/06/2015 WT53  2050 Tower   4 Pan 
92 03/06/2015 WT54  2081 Tower   2 Pan 
93 03/06/2015 WT55  2107 Tower   1 Pan 
94 03/06/2015 WT56  2128 Tower   4 Pan 
95 03/06/2015 WT57  2155 Tower   0 Pan 
96 03/06/2015 WT58  2179 Tower   2 Pan 
97 03/06/2015 WT59  2214 Tower   0 Pan 
98 03/06/2015 WT60  2233 Tower   0 Pan 
99 03/06/2015 WT61  2261 Tower   0 Pan 
100 03/06/2015 WT62  2278 Tower   0 Pan 
101 03/06/2015 WT63  2300 Tower   0 Pan 
102 04/06/2015 WT31  1497 Cascadas   0 Pan 
103 04/06/2015 WT32  1544 Cascadas   1 Pan 
104 04/06/2015 WT33  1568 Cascadas   1 Pan 
105 04/06/2015 WT34  1574 Cascadas   0 Pan 
106 04/06/2015 WT35  1608 Cascadas   5 Pan 
107 04/06/2015 WT36  1630 Cascadas   3 Pan 
108 04/06/2015 WT37  1642 Cascadas   0 Pan 
109 04/06/2015 WT38  1672 Cascadas   5 Pan 
110 04/06/2015 WT39  1685 Cascadas   6 Pan 
111 04/06/2015 WT40  1723 Cascadas   3 Pan 
112 04/06/2015 WT41  1744 Cascadas   2 Pan 
113 04/06/2015 WT42  1781 Cascadas   1 Pan 
114 04/06/2015 WT43  1794 Cascadas   2 Pan 
115 04/06/2015 WT44  1833 Cascadas   0 Pan 
116 04/06/2015 WT45  1855 Cascadas   4 Pan 
117 04/06/2015 WT46  1876 Cascadas   2 Pan   
+0

您可以發佈您的數據的'str()'還是理想地模擬一個可重現的例子? –

+0

您應該提供[可重現的示例](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example),以便我們可以看到您實際傳遞給函數的內容。通常情況下,您不會將'data ='作爲字符值傳遞,它可能應該是'data = Santa.Lucia'而不是'data =「Santa.Lucia」'。 – MrFlick

+0

@BonddedDust。太棒了!這工作,我的模型似乎現在運行。你能解釋正交多項式poly(Altitude,2),所以我可以理解模型結果嗎?另外,如何將此模型與沒有使用此函數的二次項的模型進行比較。非常感謝所有的幫助! – RosaRosa

回答

1

你幾乎沒有。正如@BondedDust所暗示的,使用兩級因子(Trap)作爲隨機效應並不現實 ;實際上, 原則上看起來不正確(Trap的等級是 不是任意的/隨機選擇的/可交換的)。當我試圖模型 與二次海拔,陷阱的固定效果和隨機效果的Date ,有人警告我說,我可能要重新調整參數:

Some predictor variables are on very different scales: consider rescaling 

(您看到這個警告夾雜在錯誤消息)。唯一連續(因此值得重新縮放)的預測因子是Altitude,所以我用scale()(唯一的缺點是這改變了係數的定量解釋,但模型本身實際上是相同的)進行集中和縮放。我還添加了觀察級別的隨機效應以允許過度分散。

結果看起來不錯,並與圖片一致。

library(lme4) 
Santa.Lucia <- transform(Santa.Lucia, 
         scAlt=scale(Altitude), 
         obs=factor(seq(nrow(Santa.Lucia)))) 
model7 <- glmer(No.Specimens~scAlt+I(scAlt^2)+Trap+(1|Date)+(1|obs), 
       family="poisson",data=Santa.Lucia,na.action=na.omit) 

summary(model7) 

## Random effects: 
## Groups Name  Variance Std.Dev. 
## obs (Intercept) 0.64712 0.8044 
## Date (Intercept) 0.02029 0.1425 
## Number of obs: 97, groups: obs, 97; Date, 6 
## 
## Fixed effects: 
##    Estimate Std. Error z value Pr(>|z|) 
## (Intercept) 0.53166 0.31556 1.685 0.09202 . 
## scAlt  -0.22867 0.14898 -1.535 0.12480 
## I(scAlt^2) -0.52840 0.16355 -3.231 0.0** 
## TrapPan  -0.01853 0.32487 -0.057 0.95451 

測試通過與缺乏它的模型比較二次項...

model7R <- update(model7, . ~ . - I(scAlt^2)) 
## convergence warning, but probably OK ... 
anova(model7,model7R) 

原則上它可能是值得考慮的二次海拔模式和陷阱之間的相互作用(允許不同由陷阱型高空趨勢),但圖片表明,它不會做太大......

library(ggplot2); theme_set(theme_bw()) 
ggplot(Santa.Lucia,aes(Altitude,No.Specimens,colour=Trap))+ 
    stat_sum(aes(size=factor(..n..)))+ 
     scale_size_discrete(range=c(2,4))+ 
      geom_line(aes(group=Date),colour="gray",alpha=0.3)+ 
       geom_smooth(method="gam",family="quasipoisson", 
          formula=y~poly(x,2))+ 
        geom_smooth(method="gam",family="quasipoisson", 
           formula=y~poly(x,2),se=FALSE, 
           aes(group=1),colour="black") 

enter image description here

2

的問題是幾乎可以肯定是由於你將字符向量傳遞給參數:

..., data="Santa.Lucia, ..." 

?glmerdata說法應該是:

data: an optional data frame containing the variables named in 
     ‘formula’. By default the variables are taken from the 
     environment from which ‘lmer’ is called. While ‘data’ is 
     optional, the package authors _strongly_ recommend its use, 
     especially when later applying methods such as ‘update’ and 
     ‘drop1’ to the fitted model (_such methods are not guaranteed 
     to work properly if ‘data’ is omitted_). If ‘data’ is 
     omitted, variables will be taken from the environment of 
     ‘formula’ (if specified as a formula) or from the parent 
     frame (if specified as a character vector). 

括號中的最後一部分,「如果指定的字符向量」涉及到發生了什麼。如果formula規範是作爲一個字符向量,而不是指定data作爲一個角色。

更正您的來電,包括data = Santa.Lucia,你應該很好去。

+0

非常感謝。我從來沒有意識到我犯的這個愚蠢的錯誤。不幸的是,我現在有了新的錯誤信息,這些信息更加難以理解。我編輯了我的原始問題以包含它們。再次感謝您的幫助! – RosaRosa

1

您已設法使用兩種不同的日期格式。這裏有一個修復:

Santa.Lucia$Date2 <- ifelse(nchar(as.character(Santa.Lucia$Date)) > 10, 
          as.Date(Santa.Lucia$Date, format="%d-%b-%Y"), 
          as.Date(Santa.Lucia$Date, format="%d/%m/%Y")) 

我試過一個簡單的模型:

(model6 <-glmer(No.Specimens~Altitude+(1|Date2)+(1|Trap),family="poisson",data=Santa.Lucia,na.action=na.omit)) 
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [ 
glmerMod] 
Family: poisson (log) 
Formula: No.Specimens ~ Altitude + (1 | Date2) + (1 | Trap) 
    Data: Santa.Lucia 
     AIC  BIC logLik deviance df.resid 
368.6522 378.9510 -180.3261 360.6522  93 
Random effects: 
Groups Name  Std.Dev. 
Date2 (Intercept) 0.2248 
Trap (Intercept) 0.0000 
Number of obs: 97, groups: Date2, 6; Trap, 2 
Fixed Effects: 
(Intercept)  Altitude 
    1.3696125 -0.0004992 
Warning messages: 
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : 
    Model failed to converge with max|grad| = 0.0516296 (tol = 0.001, component 3) 
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : 
    Model is nearly unidentifiable: very large eigenvalue 
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio 
- Rescale variables? 

實際上,我能得到我的建議的修改沒有錯誤或警告運行,但我覺得用這兩個分組不正確的,因爲一個預測另一個:

> table(Santa.Lucia$Date2, Santa.Lucia$Trap) 

     Net Pan 
    16310 0 11 
    16312 11 0 
    16313 11 0 
    16314 0 31 
    16589 0 17 
    16590 0 16 

這就是爲什麼你得到不收斂。錯誤模型不是錯誤的,而是設計和數據收集中的病態。我懷疑你是否真的有足夠的數據來支持混合模式:

(model5 <-glm(No.Specimens~Altitude,family="poisson",data=Santa.Lucia,na.action=na.omit)) 

Call: glm(formula = No.Specimens ~ Altitude, family = "poisson", data = Santa.Lucia, 
    na.action = na.omit) 

Coefficients: 
(Intercept)  Altitude 
    1.4218234 -0.0005391 

Degrees of Freedom: 96 Total (i.e. Null); 95 Residual 
Null Deviance:  215.3 
Residual Deviance: 213.2 AIC: 368.6 

要與二次海拔模式比較:

(model5.2 <-glm(No.Specimens~poly(Altitude,2),family="poisson",data=Santa.Lucia,na.action=na.omit)) 

Call: glm(formula = No.Specimens ~ poly(Altitude, 2), family = "poisson", 
    data = Santa.Lucia, na.action = na.omit) 

Coefficients: 
     (Intercept) poly(Altitude, 2)1 poly(Altitude, 2)2 
      0.3188    -1.7116    -3.9539 

Degrees of Freedom: 96 Total (i.e. Null); 94 Residual 
Null Deviance:  215.3 
Residual Deviance: 194.6 AIC: 352 
> anova(model5.2) 
Analysis of Deviance Table 

Model: poisson, link: log 

Response: No.Specimens 

Terms added sequentially (first to last) 


        Df Deviance Resid. Df Resid. Dev 
NULL         96  215.31 
poly(Altitude, 2) 2 20.698  94  194.61 
> anova(model5.2, model5) 
Analysis of Deviance Table 

Model 1: No.Specimens ~ poly(Altitude, 2) 
Model 2: No.Specimens ~ Altitude 
    Resid. Df Resid. Dev Df Deviance 
1  94  194.61    
2  95  213.20 -1 -18.59 
+0

我明白你在說什麼了。如果在這個模型中只包含陷阱類型作爲一個隨機效應,那麼是否可以?再次感謝 – RosaRosa

+1

我曾被這方面的專家告知,當你只有兩組時,使用混合模型確實沒有用。您沒有得到穩定的分組變量的方差估計。這與在完整的固定效應模型中簡單添加協變量沒有什麼不同。所以「是」檢查陷阱類型的可能影響,但對混合模型的「否」。 anova「重要性」基本相同。 –