2011-11-29 92 views
6

我正在執行概率計算。我有很多非常小的數字,我想從1中減去所有這些數字,並且準確地做到這一點。我可以精確計算這些小數的對數。我的策略至今一直像這樣(使用numpy的):如何計算1的對數減去python中給定小數的指數

由於日誌數量較少,x的數組,計算:

y = numpy.logaddexp.reduce(x) 

現在我想計算像1-exp(y)甚至更​​好log(1-exp(y)) ,但我不知道如何做,而不會失去我的所有精度。

實際上,即使是logaddexp函數也會遇到精度問題。矢量x中的值範圍可以從-2到-800,或者甚至更負。來自上面的向量y基本上會有一個數字類型的eps大約1e-16的整段數字。因此,舉例來說,在精確計算的數據看起來是這樣的:

In [358]: x 
Out[358]: 
[-5.2194676211172837, 
-3.9050377656308362, 
-3.1619783292449615, 
-2.71289594096134, 
-2.4488395891021639, 
-2.3129210706827568, 
-2.2709987626652346, 
-2.3007776073511259, 
-2.3868404149802434, 
-2.5180718876609163, 
-2.68619816583087, 
-2.8849022632856958, 
-3.1092603032627686, 
-3.3553673369747834, 
-3.6200806272462351, 
-3.9008385919463073, 
-4.1955300857178379, 
-4.5023981074719899, 
-4.8199676154248081, 
-5.1469905756384904, 
-5.4824035553480428, 
-5.8252945959126876, 
-6.174877049340779, 
-6.5304687083067563, 
-6.8914750074202473, 
-7.25737538919104, 
-7.6277121540338797, 
-8.0020812775389558, 
-8.3801247986220773, 
-8.7615244716292437, 
-9.1459964426584435, 
-9.5332867613176404, 
-9.9231675781398394, 
-10.315433907978701, 
-10.709900863130784, 
-11.106401278287066, 
-11.50478366390567, 
-11.904910436107656, 
-12.30665638039909, 
-12.709907313918777, 
-13.114558916892051, 
-13.52051570882999, 
-13.927690148982549, 
-14.336001843810081, 
-14.745376846921289, 
-15.155747039147968, 
-15.567049578271309, 
-15.979226409456359, 
-16.39222382873956, 
-16.805992092998878, 
-17.22048507074976, 
-17.63565992888303, 
-18.051476851117201, 
-18.467898784496384, 
-18.884891210740903, 
-19.302421939667397, 
-19.720460922243518, 
-20.138980081145718, 
-20.557953156947775, 
-20.977355568292495, 
-21.397164284594595, 
-21.817357709992422, 
-22.237915577412224, 
-22.658818851739369, 
-23.080049641202237, 
-23.501591116172762, 
-23.923427434676114, 
-24.345543673975158, 
-24.767925767665417, 
-25.190560447772668, 
-25.61343519140047, 
-26.036538171518259, 
-26.459858211524278, 
-26.883384743252066, 
-27.307107768123842, 
-27.731017821180984, 
-28.155105937748402, 
-28.579363622513654, 
-29.003782820820732, 
-29.428355891997484, 
-29.853075584553352, 
-30.27793501309668, 
-30.702927636836705, 
-31.128047239545907, 
-31.553287910869187, 
-31.978644028878307, 
-32.404110243774596, 
-32.82968146265631, 
-33.255352835270173, 
-33.681119740674262, 
-34.106977774747804, 
-34.532922738484046, 
-34.958950627012712, 
-35.385057619298891, 
-35.811240068471022, 
-36.237494492735493, 
-36.663817566835519, 
-37.090206114019054, 
-37.516657098479527, 
-37.943167618239784, 
-38.369734898447348, 
-38.796356285056333, 
-39.223029238868548, 
-39.64975132991276, 
-40.076520232137909, 
-40.5033337184027, 
-40.930189655741344, 
-41.357086000888444, 
-41.784020796047173, 
-42.210992164885965, 
-42.637998308748706, 
-43.065037503066776, 
-43.492108093959985, 
-43.919208495015312, 
-44.346337184233221, 
-44.773492701130749, 
-45.200673643993753, 
-45.627878667267964, 
-46.055106479082156, 
-46.482355838895614, 
-46.909625555262096, 
-47.336914483704675, 
-47.764221524695017, 
-48.191545621730768, 
-48.618885759506213, 
-49.04624096217151, 
-49.473610291673936, 
-49.900992846179292, 
-50.328387758566748, 
-50.755794194994508, 
-51.183211353532613, 
-51.610638462858901, 
-52.0380747810147, 
-52.46551959421754, 
-52.892972215728378, 
-53.320431984769073, 
-53.747898265489198, 
-54.175370445978274, 
-54.602847937323247, 
-55.030330172705362, 
-55.457816606538813, 
-55.885306713645889, 
-56.312799988467418, 
-56.740295944308855, 
-57.167794112617116, 
-57.59529404228897, 
-58.02279529900909, 
-58.450297464615232, 
-58.877800136490578, 
-59.305302926981085, 
-59.732805462838542, 
-60.160307384683506, 
-60.587808346493375, 
-61.015308015110463, 
-61.442806069768608, 
-61.87030220164138, 
-62.297796113406662, 
-62.725287518829532, 
-63.15277614236129, 
-63.580261718755196, 
-64.007743992695964, 
-64.435222718445743, 
-64.862697659501919, 
-65.290168588270035, 
-65.717635285748088, 
-66.14509754122389, 
-66.572555151982783, 
-67.000007923029216, 
-67.427455666815376, 
-67.854898202982099, 
-68.282335358110231, 
-68.709766965479957, 
-69.137192864839108, 
-69.564612902180784, 
-69.992026929530198, 
-70.419434804735829, 
-70.8468363912732, 
-71.274231558051156, 
-71.701620179229167, 
-72.129002134037705, 
-72.556377306608397, 
-72.983745585807242, 
-73.411106865077045, 
-73.838461042282461, 
-74.265808019561746, 
-74.693147703185559, 
-75.120480003416901, 
-75.547804834380145, 
-75.97512211393132, 
-76.402431763534764, 
-76.829733708143749, 
-77.257027876085431, 
-77.684314198948414, 
-78.111592611476681, 
-78.538863051464546, 
-78.966125459656723, 
-79.393379779652037, 
-79.820625957809625, 
-80.24786394315754, 
-80.675093687306912, 
-81.102315144366912] 

然後我試着計算指數的對數之和:

In [359]: np.logaddexp.accumulate(x) 
Out[359]: 
array([ -5.21946762e+00, -3.66710221e+00, -2.68983273e+00, 
     -2.00815067e+00, -1.51126604e+00, -1.14067818e+00, 
     -8.60829425e-01, -6.48188808e-01, -4.86276416e-01, 
     -3.63085873e-01, -2.69624488e-01, -1.99028599e-01, 
     -1.45996863e-01, -1.06408884e-01, -7.70565672e-02, 
     -5.54467248e-02, -3.96506186e-02, -2.81859503e-02, 
     -1.99225261e-02, -1.40061296e-02, -9.79701394e-03, 
     -6.82045164e-03, -4.72733966e-03, -3.26317960e-03, 
     -2.24396350e-03, -1.53767347e-03, -1.05026994e-03, 
     -7.15209142e-04, -4.85690052e-04, -3.28980607e-04, 
     -2.22305294e-04, -1.49890553e-04, -1.00858788e-04, 
     -6.77380054e-05, -4.54139175e-05, -3.03974537e-05, 
     -2.03154477e-05, -1.35581905e-05, -9.03659252e-06, 
     -6.01552344e-06, -3.99984336e-06, -2.65671945e-06, 
     -1.76283376e-06, -1.16860435e-06, -7.73997496e-07, 
     -5.12213574e-07, -3.38706792e-07, -2.23809375e-07, 
     -1.47785898e-07, -9.75226648e-08, -6.43149957e-08, 
     -4.23904687e-08, -2.79246430e-08, -1.83858489e-08, 
     -1.20995365e-08, -7.95892319e-09, -5.23300609e-09, 
     -3.43929670e-09, -2.25953475e-09, -1.48391255e-09, 
     -9.74194956e-10, -6.39351406e-10, -4.19466218e-10, 
     -2.75121795e-10, -1.80397409e-10, -1.18254918e-10, 
     -7.74993004e-11, -5.07775611e-11, -3.32619009e-11, 
     -2.17835737e-11, -1.42634249e-11, -9.33764336e-12, 
     -6.11190167e-12, -3.99989955e-12, -2.61737204e-12, 
     -1.71253165e-12, -1.12043465e-12, -7.33052079e-13, 
     -4.79645919e-13, -3.13905885e-13, -2.05519681e-13, 
     -1.34650094e-13, -8.83173582e-14, -5.80300378e-14, 
     -3.82338678e-14, -2.52963381e-14, -1.68421145e-14, 
     -1.13181549e-14, -7.70918073e-15, -5.35155125e-15, 
     -3.81152630e-15, -2.80565548e-15, -2.14872312e-15, 
     -1.71971577e-15, -1.43957518e-15, -1.25665732e-15, 
     -1.13722927e-15, -1.05925916e-15, -1.00835857e-15, 
     -9.75131524e-16, -9.53442707e-16, -9.39286186e-16, 
     -9.30046550e-16, -9.24016349e-16, -9.20080954e-16, 
     -9.17512772e-16, -9.15836886e-16, -9.14743318e-16, 
     -9.14029759e-16, -9.13564174e-16, -9.13260398e-16, 
     -9.13062204e-16, -9.12932898e-16, -9.12848539e-16, 
     -9.12793505e-16, -9.12757603e-16, -9.12734183e-16, 
     -9.12718905e-16, -9.12708939e-16, -9.12702438e-16, 
     -9.12698198e-16, -9.12695432e-16, -9.12693627e-16, 
     -9.12692451e-16, -9.12691683e-16, -9.12691183e-16, 
     -9.12690856e-16, -9.12690643e-16, -9.12690504e-16, 
     -9.12690414e-16, -9.12690355e-16, -9.12690316e-16, 
     -9.12690291e-16, -9.12690275e-16, -9.12690264e-16, 
     -9.12690257e-16, -9.12690252e-16, -9.12690249e-16, 
     -9.12690248e-16, -9.12690246e-16, -9.12690245e-16, 
     -9.12690245e-16, -9.12690245e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16, 
     -9.12690244e-16, -9.12690244e-16, -9.12690244e-16]) 

最終導致:

In [360]: np.logaddexp.reduce(x) 
Out[360]: -9.1269024387687033e-16 

所以我的精確度已經消失了。任何想法如何解決這個問題?

+0

我只是好奇,你的號碼是什麼(如果它不是祕密)? – Binus

+0

@UriLaserson如果其中一個答案有幫助,請將其標記爲已接受。 –

回答

7

在Python 2.7,我們增加了math.expm1()這個用例:

>>> from math import exp, expm1 
>>> exp(1e-5) - 1 # gives result accurate to 11 places 
1.0000050000069649e-05 
>>> expm1(1e-5) # result accurate to full precision 
1.0000050000166668e-05 

此外,還有math.fsum()對於沒有精度損失的累計工序:

>>> sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 
0.9999999999999999 
>>> fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 
1.0 

最後,如果沒有別的幫助,您可以使用支持超高精度算術的decimal module

>>> from decimal import * 
>>> getcontext().prec = 200 
>>> (1 - 1/Decimal(7000000)).ln() 
Decimal('-1.4285715306122546161332083855139723669559469615692284955124609122046580004888309867906750714454869716398919778588515625689415322789136206397998627088895481989036005482451668027002380442299229191323673E-7') 
+0

對不起,請檢查編輯,一個問題是,我甚至到達指數時精度已經超過了。 –

0

完全像Raymond Hettinger說的,但是你當然需要乘以-1,因爲你想要1 - exp而不是exp - 1

+0

對不起,請檢查編輯,一個問題是,我甚至達到指數的時候精度已經超過了。 –

3

我建議用它們的exp()和log()替換它們Taylor series相應地在0和1附近。這樣,你就不會因爲使用大數字而失去精確性(我的名字叫1)。使用Lagrange remainder公式或只是成員的表達與一些預留來確定,因爲當差異將超出您的精度。

更新:

的Python 2.7的math.expm1exp(x)-1)和math.log1plog(1+x))爲你做這個,如果平臺的C庫的精度(通常爲雙)就足夠了。 (如果不是,你將不得不求助於特殊的數學軟件(x86的FPU可以以更高的精度計算))。

2

我不知道這是否是你想要

numpy.expm1(x[, out]) 
Calculate exp(x) - 1 for all elements in the array. 



>>> import numpy as np 
>>> np.expm1(x).sum() 
-200.0 
>>> (-np.expm1(x)).sum() 
200.0 
>>> from scipy import special 
>>> (-special.expm1(x)).sum() 
200.0 
>>> np.log((-special.expm1(x)).sum()) 
5.2983173665480363 

編輯什麼:

對不起,我不知道,這只是雷蒙德的Hettinger的回答的numpy的版本。

(不是答案的數值問題)

我仍然不知道確切的問題是什麼,但是,而不是在它扔十進制或mpmath,也許問題進行改寫會有所幫助。例如,如果將泊松中的概率相加,最終總是會「接近」到1.但是對於一些問題,我們可以避免使用生存函數而不是cdf處理一些問題。

+0

不要忘記numpy.log1p – matt

+0

我玩了log1p這個問題一段時間。但是因爲我不確定這個問題實際上要求什麼,所以我放棄了。 – user333700

1

我用mpmath來解決類似的問題。這是一個很好的和有據可查的100%python庫。如果速度對你來說是一個問題;考慮與gmpy一起使用mpmath。看到這個link這樣做。

3

不知道很多關於蟒蛇,我做了我的大部分在Java中的工作,但在我看來,你會過得更好執行上的所有值對數和-EXP招同時第一寧可做兩份兩個使用numpy.logaddexp.accumulate

在google中快速搜索會顯示python庫中的候選人:scipy.misc.logsumexp

在那會不會是很難給自己設定任何情況下:

logsumexp(probs) := max(probs) + log(sum[i](exp(probes[i]-max(probs)))) 

所以像:

maxValue = -Inf; 
    for x in probs 
    if x > maxValue then maxValue = x 
    expSum = 0 
    for x in probs 
    expSum += exp(x - maxValue) 
    return log(expSum) 

返回一個值,說p,爲的只是日誌所有概率的總和在probs。請注意,如果有大規模的輸入概率最大,較小的值之間的不同,小的會被忽略,但如果只是他們的貢獻是與大量在大多數應用中應該是精細比較非常小。

您可以使用更復雜的策略來使這些小數值可以在有大量小數字的情況下進行計數,比如probe = 0.5 + 1E-7 + 1E-7 ...百萬次,所以加起來爲0.1 。你可以做的是把幾個單獨的總和分在一起,在這些總和中大致相同的比例值先被加在一起,然後再合併。或者你可以使用一些中間樞紐值,而不是最大的,但你必須確保最大的價值不是太大,因爲那時EXP(probs [I] - 透視)會導致在這種情況下溢出。

一旦做到這一點,你仍然需要做計算日誌(1-EXP(P))

對於我發現這個文檔描述的方式,以避免儘可能多的精度損失儘可能使用邏輯函數你可以在大多數語言數學公共庫中找到。

Maechler M, Accurately Computing log(1 − exp(− |a|)) Assessed by the Rmpfr package, 2012

的關鍵是使用的兩種可能的方法中的一種根據輸入值一個

定義:

log1mexp(a) := log(1-exp(a)) ### function that we seek to implement. 
log1p(a) := log(1+a) # function provided by common math libraries. 
exp1m(a) := exp(a) - 1 # function provided by common math libraries. 

還有就是要實現log1mexp使用log1p一個明顯的方式:

log1mexp(a) := log1p(-exp(a)) 

隨着exp1m你可以這樣做:

log1mexp(a) := log(-expm1(a)) 

你應該再使用log1p a < log(.5)和expm1時a> = log(.5)

log1mexp(a) := (a < log(.5)) ? log1p(-exp(a)) : log(-expm1(a)). 

請參考外部鏈接瞭解更多信息。