2017-06-01 98 views
1

中的累積分佈擬合到R後創建正態分佈在用Gompertz函數成功擬合我的累積數據之後,我需要從擬合函數創建正態分佈。將擬合參數

這是迄今爲止代碼:

 df <- data.frame(x = c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196), 
       y = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999)) 

library(drc) 
fm <- drm(y ~ x, data = df, fct = G.3()) 

options(scipen = 10) #to avoid scientific notation in x axis 

plot(df$x, predict(fm),type = "l", log = "x",col = "blue", main = "Cumulative function distribution",xlab = "x", ylab = "y") 

points(df,col = "red") 

legend("topleft", inset = .05,legend = c("exp","fit") 
     ,lty = c(NA,1), col = c("red", "blue"), pch = c(1,NA), lwd=1, bty = "n") 


summary(fm) 

這是下面的情節:現在 enter image description here

我的想法是此累積適合某種程度上轉化爲正態分佈。有什麼想法我怎麼能這樣做?

回答

1

我在考慮cumdiff(因爲沒有更好的術語)。 link幫助了很多。

編輯

plot(df$x[-1], Mod(df$y[-length(df$y)]-df$y[-1]), log = "x", type = "b", 
     main = "Normal distribution for original data", 
     xlab = "x", ylab = "y") 

產生:

For original data set

加成

爲了從fitted功能得到高斯:

df$y_pred<-predict(fm) 
plot(df$x[-1], Mod(df$y_pred[-length(df$y_pred)]-df$y_pred[-1]), log = "x", 
    type = "b", main="Normal distribution for fitted function", 
    xlab = "x", lab = "y") 

產生:

Fitted data

+1

好的,所以這是您從我的初始數據集創建正常繪圖而不是從擬合創建正常繪圖的方式。現在我會深入研究並瞭解如何配合eq。 – numb

+0

非常感謝!但是有沒有一種方法可以在x軸而不是索引上繪製x值?我試過了,但是最後我遇到了一個問題:Mod(df $ y [-length(df $ y)] - df $ y [-1])''和'df $ x' ... – numb

+1

對了......我也在想這個。試試'str(fm)',看看你能否得到一些信息。畢竟,我並不熟悉'drc'軟件包。現在我不能潛入,但我保證我會盡快回復你。 – amonk

1

雖然你的初衷可能是無參數的話,建議使用參數估計方法:矩量法,被廣泛用於諸如這樣的問題,因爲你有一定的參數分佈(正態分佈)來擬合。這個想法很簡單,從擬合的累積分佈函數中,可以計算我的代碼中的平均值(E1)和方差(我的代碼中的SD的平方),然後解決問題,因爲正態分佈可以完全由均值和方差。

df <- data.frame(x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196), 
       y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999)) 

library(drc) 
fm <- drm(y ~ x, data = df, fct = G.3()) 

options(scipen = 10) #to avoid scientific notation in x axis 

plot(df$x, predict(fm),type="l", log = "x",col="blue", main="Cumulative distribution function",xlab="x", ylab="y") 

points(df,col="red") 

E1 <- sum((df$x[-1] + df$x[-length(df$x)])/2 * diff(predict(fm))) 
E2 <- sum((df$x[-1] + df$x[-length(df$x)])^2/4 * diff(predict(fm))) 
SD <- sqrt(E2 - E1^2) 
points(df$x, pnorm((df$x - E1)/SD), col = "green") 

legend("topleft", inset = .05,legend= c("exp","fit","method of moment") 
     ,lty = c(NA,1), col = c("red", "blue", "green"), pch = c(1,NA), lwd=1, bty="n") 


summary(fm) 

CDF

而且估計結果:

## > E1 (mean of fitted normal distribution) 
## [1] 65.78474 
## > E2 (second moment of fitted normal distribution) 
##[1] 5792.767 
## > SD (standard deviation of fitted normal distribution) 
## [1] 38.27707 
## > SD^2 (variance of fitted normal distribution) 
## [1] 1465.134 

編輯:更新的方法來計算CDF配合裝配drc時刻。下面定義的函數moment使用連續r.v的矩公式計算矩估計。 E(X^k) = k * \int x^{k - 1} (1 - cdf(x)) dx。這些是我能從合適的cdf中得到的最好估計。當x接近零時,由於原始數據集中的原因,正如我在評論中所討論的那樣,該擬合併不是很好。

df <- data.frame(x=c(0.01,0.011482,0.013183,0.015136,0.017378,0.019953,0.022909,0.026303,0.0302,0.034674,0.039811,0.045709,0.052481,0.060256,0.069183,0.079433,0.091201,0.104713,0.120226,0.138038,0.158489,0.18197,0.20893,0.239883,0.275423,0.316228,0.363078,0.416869,0.47863,0.549541,0.630957,0.724436,0.831764,0.954993,1.096478,1.258925,1.44544,1.659587,1.905461,2.187762,2.511886,2.884031,3.311311,3.801894,4.365158,5.011872,5.754399,6.606934,7.585776,8.709636,10,11.481536,13.182567,15.135612,17.378008,19.952623,22.908677,26.30268,30.199517,34.673685,39.810717,45.708819,52.480746,60.255959,69.183097,79.432823,91.201084,104.712855,120.226443,138.038426,158.489319,181.970086,208.929613,239.883292,275.42287,316.227766,363.078055,416.869383,478.630092,549.540874,630.957344,724.43596,831.763771,954.992586,1096.478196), 
       y=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.00044816,0.00127554,0.00221488,0.00324858,0.00438312,0.00559138,0.00686054,0.00817179,0.00950625,0.01085188,0.0122145,0.01362578,0.01514366,0.01684314,0.01880564,0.02109756,0.0237676,0.02683182,0.03030649,0.0342276,0.03874555,0.04418374,0.05119304,0.06076553,0.07437854,0.09380666,0.12115065,0.15836926,0.20712933,0.26822017,0.34131335,0.42465413,0.51503564,0.60810697,0.69886817,0.78237651,0.85461023,0.91287236,0.95616228,0.98569093,0.99869001,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999,0.99999999)) 

library(drc) 
fm <- drm(y ~ x, data = df, fct = G.3()) 

moment <- function(k){ 
    f <- function(x){ 
     x^(k - 1) * pmax(0, 1 - predict(fm, data.frame(x = x))) 
    } 
    k * integrate(f, lower = min(df$x), upper = max(df$x))$value 
} 

E1 <- moment(1) 
E2 <- moment(2) 
SD <- sqrt(E2 - E1^2) 
+1

謝謝你的想法。我會深入研究這個,看看我該如何處理時刻的方法。我有點擔心它在圖的第一部分不適合(在x = 80之前的某處)。你有什麼想法,爲什麼? – numb

+0

所以,我試圖從你的時刻繪製正態分佈,並且在起點上的不合適會導致怪異的正態分佈(因爲它在第一時期沒有達到0)。這是我使用的代碼: 'y2 < - dnorm(df $ x,mean = E1,sd = SD)plot(df $ x,y2,type =「b」)' 這是[plot]( http://imgur.com/a/0ZCWE) – numb

+1

@numb我發現問題是因爲SD估計過大。這是因爲當x很小時,原來的'df $ x'是稠密的,但當x很大時很稀疏,這就導致了這個問題。我正在尋找方法來獲得更好的估計。 – Consistency