2016-10-04 93 views
1

我正在嘗試估計堆法的常量。 我有以下數據集novels_colection`nls`無法估計我的模型的參數

Number of novels DistinctWords WordOccurrences 
1    1   13575   117795 
2    1   34224   947652 
3    1   40353   1146953 
4    1   55392   1661664 
5    1   60656   1968274 

然後,我建立了一個功能:

# Function for Heaps law 
heaps <- function(K, n, B){ 
    K*n^B 
} 
heaps(2,117795,.7) #Just to test it works 

所以n = Word OccurrencesKB是價值觀,應該是常數,以便找到我的鮮明預測話。

我試過,但它給了我一個錯誤:

fitHeaps <- nls(DistinctWords ~ heaps(K,WordOccurrences,B), 
    data = novels_collection[,2:3], 
    start = list(K = .1, B = .1), trace = T) 

錯誤= Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model

中,我怎麼能解決這個問題的任何想法或以適應功能,並獲得值K的方法和B

+0

你是什麼意思?我應該記錄什麼變換?對於K,我認爲它必須是正面的,但我不確定 –

+1

... ergo,不需要非線性模型,因爲它可以用線性模型解決。 :) –

回答

2

如果在y = K * n^B的兩側進行對數變換,則會得到log(y) = log(K) + B * log(n)。這是log(y)log(n)之間的線性關係,因此您可以擬合線性迴歸模型來查找log(K)B

logy <- log(DistinctWords) 
logn <- log(WordOccurrences) 

fit <- lm(logy ~ logn) 

para <- coef(fit) ## log(K) and B 
para[1] <- exp(para[1]) ## K and B 
1

隨着minpack.lm我們可以適應非線性模型,但我想這將是容易比對數變換後的變量會做(用做宋哲元)的線性模型的過度擬合,但是我們可以比較一些外推數據集上的線性/非線性模型的殘差以得到實驗結果,這將是有趣的。

library(minpack.lm) 
fitHeaps = nlsLM(DistinctWords ~ heaps(K, WordOccurrences, B), 
        data = novels_collection[,2:3], 
        start = list(K = .01, B = .01)) 
coef(fitHeaps) 
#  K   B 
# 5.0452566 0.6472176 

plot(novels_collection$WordOccurrences, novels_collection$DistinctWords, pch=19) 
lines(novels_collection$WordOccurrences, predict(fitHeaps, newdata = novels_collection[,2:3]), col='red') 

enter image description here