2016-05-16 69 views
1

我試圖應用下面的代碼,並且它對任何沒有NA值的數據都能正常工作。然而,當我包括數據與NA值I收到以下消息:在循環中應用lm後出現錯誤

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases

我使用的代碼是:

m <- data.frame(matrix(ncol = 5, nrow = length(unique(df$Year))*length(unique(df$Firm)))) 
    enter code here 
l = 0 
for(i in unique(df$Year)) { 
    for(j in unique(df$Firm)) { 
    l = l + 1 
    mod<-lm(Ri ~ RM + Rz, data = df, subset = df$Year==i & df$Firm ==j) 
    m[l,] <- c(i, 
       as.character(j), 
       mod$coefficients[2], 
       mod$coefficients[3], 
       summary(mod)$sigma) 
    } 
} 
names(m) <- c("Year", "Firm", "B1", "B2","e") 

這是關於我使用的數據的一個示例:

Year Firm Ri Rm Rz 
2009 A  30 55 NA 
2009 A  0  55 NA 
2009 A  1  55 NA 
2010 A  7  55 85 
2010 A  15 NA 85 
2011 A  0  55 85 
2011 A  3.5 55 85 
2011 A  8  NA 85 
2009 B  24 55 85 
2009 B  30 55 85 
2009 B  25 55 85 
2010 B  5.2 NA 85 
2010 B  11.8 55 85 
2011 B  0  55 NA 
2011 B  90 55 NA 
2011 B  57 55 NA 

任何建議?

從上面的數據問題
+0

你如何試試'data = subset(df,.....)'? – Gopala

+2

如果您發佈了數據框'df'的樣本,那麼可能有更好的一段代碼可以幫助您。 – Gopala

+0

謝謝@Gopala的建議 –

回答

4

除此之外,你可以重新編寫代碼中使用的dplyrbroom封裝的組合如下:

library(dplyr) 
library(tidyr) 
df$Rz <- 85 # Imput values of Rz to make the code work 
df %>% group_by(Year, Firm) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .))) 

Source: local data frame [6 x 7] 
Groups: Year, Firm [6] 

    Year Firm  term estimate std.error statistic  p.value 
    <int> <fctr>  <chr> <dbl>  <dbl>  <dbl>  <dbl> 
1 2009  A (Intercept) 10.33333 9.837570 1.050395 0.403735888 
2 2009  B (Intercept) 26.33333 1.855921 14.188819 0.004930448 
3 2010  A (Intercept) 7.00000  NaN  NaN   NaN 
4 2010  B (Intercept) 11.80000  NaN  NaN   NaN 
5 2011  A (Intercept) 1.75000 1.750000 1.000000 0.500000000 
6 2011  B (Intercept) 49.00000 26.286879 1.864048 0.203331016 

UPDATE:添加過濾器選項,以便年/事務所組不具有在另一個(自變量)中的一個的所有NAS可使用lm來適應:

df %>% group_by(Year, Firm) %>% filter(!all(is.na(Rm)) & !all(is.na(Rz))) %>% do(tidy(lm(Ri ~ Rm + Rz, data = .))) 
Source: local data frame [4 x 7] 
Groups: Year, Firm [4] 

    Year Firm  term estimate std.error statistic  p.value 
    <int> <fctr>  <chr> <dbl>  <dbl>  <dbl>  <dbl> 
1 2009  B (Intercept) 26.33333 1.855921 14.18882 0.004930448 
2 2010  A (Intercept) 7.00000  NaN  NaN   NaN 
3 2010  B (Intercept) 11.80000  NaN  NaN   NaN 
4 2011  A (Intercept) 1.75000 1.750000 1.00000 0.500000000 

此輸出顯示僅截距模型擬合由於在所提供的樣本數據中沒有其他的可變性。但是,如果你有這樣的變化(例如在mtcars數據集),你會得到如下輸出:

mtcars %>% group_by(cyl) %>% do(tidy(lm(mpg ~ wt + am, data = mtcars))) 
Source: local data frame [9 x 6] 
Groups: cyl [3] 

    cyl  term estimate std.error statistic  p.value 
    <dbl>  <chr>  <dbl>  <dbl>  <dbl>  <dbl> 
1  4 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13 
2  4   wt -5.35281145 0.7882438 -6.79080719 1.867415e-07 
3  4   am -0.02361522 1.5456453 -0.01527855 9.879146e-01 
4  6 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13 
5  6   wt -5.35281145 0.7882438 -6.79080719 1.867415e-07 
6  6   am -0.02361522 1.5456453 -0.01527855 9.879146e-01 
7  8 (Intercept) 37.32155131 3.0546385 12.21799285 5.843477e-13 
8  8   wt -5.35281145 0.7882438 -6.79080719 1.867415e-07 
9  8   am -0.02361522 1.5456453 -0.01527855 9.879146e-01 

編輯:添加一個簡單的例子,證明在原崗位的問題:

x <- 1:10 
y <- 1:10 
z <- NA 
df <- data.frame(x = x, y = y, z = z) 
lm(x ~ y + z, data = df) 
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
    0 (non-NA) cases