分析一個Haskell程序

我有一段代碼，使用sequence從一個概率分佈重複抽樣。道德上，它是這樣的：分析一個Haskell程序

sampleMean :: MonadRandom m => Int -> m Float -> m Float 
sampleMean n dist = do 
    xs <- sequence (replicate n dist) 
    return (sum xs)

除了它有點複雜。我感興趣的實際代碼是功能likelihoodWeighting在this Github repo。

我注意到運行時間與n呈非線性關係。特別是，一旦n超過了某個值，它就達到了內存限制，並且運行時間爆炸了。我不確定，但我認爲這是因爲sequence正在建立一個長長的thunk列表，直到撥打sum纔得到評估。

一旦我獲得了大約100,000個樣本，程序就會慢慢爬行。我想優化它（我的感覺是1000萬個樣本不應該成爲問題），所以我決定對它進行描述 - 但是我對理解profiler的輸出有點麻煩。

剖析

我在與10萬個樣本運行我的功能的文件main.hs創建的短期可執行文件。這裏是做的輸出

$ ghc -O2 -rtsopts main.hs 
$ ./main +RTS -s

我注意到的第一件事 - 它分配近1.5 GB的堆，並花60％的時間在垃圾回收上。這通常表明懶惰太多了嗎？

1,377,538,232 bytes allocated in the heap 
1,195,050,032 bytes copied during GC 
    169,411,368 bytes maximum residency (12 sample(s)) 
    7,360,232 bytes maximum slop 
      423 MB total memory in use (0 MB lost due to fragmentation) 

Generation 0: 2574 collections,  0 parallel, 2.40s, 2.43s elapsed 
Generation 1: 12 collections,  0 parallel, 1.07s, 1.28s elapsed 

INIT time 0.00s ( 0.00s elapsed) 
MUT time 1.92s ( 1.94s elapsed) 
GC time 3.47s ( 3.70s elapsed) 
RP time 0.00s ( 0.00s elapsed) 
PROF time 0.23s ( 0.23s elapsed) 
EXIT time 0.00s ( 0.00s elapsed) 
Total time 5.63s ( 5.87s elapsed) 

%GC time  61.8% (63.1% elapsed) 

Alloc rate 716,368,278 bytes per MUT second 

Productivity 34.2% of total user, 32.7% of total elapsed

下面是

$ ./main +RTS -p

結果我第一次跑這一點，事實證明，有被重複調用一個函數，它原來我可以memoize的它，這加快東西增加了2倍。然而，它沒有解決空間泄漏問題。

COST CENTRE   MODULE    no. entries %time %alloc %time %alloc 

MAIN     MAIN     1  0 0.0 0.0 100.0 100.0 
main     Main     434  4 0.0 0.0 100.0 100.0 
    likelihoodWeighting AI.Probability.Bayes 445  1 0.0 0.3 100.0 100.0 
    distributionLW  AI.Probability.Bayes 448  1 0.0 2.6  0.0 2.6 
    getSampleLW  AI.Probability.Bayes 446 100000 20.0 50.4 100.0 97.1 
    bnProb   AI.Probability.Bayes 458 400000 0.0 0.0  0.0 0.0 
    bnCond   AI.Probability.Bayes 457 400000 6.7 0.8  6.7 0.8 
    bnVals   AI.Probability.Bayes 455 400000 20.0 6.3 26.7 7.1 
    bnParents  AI.Probability.Bayes 456 400000 6.7 0.8  6.7 0.8 
    bnSubRef   AI.Probability.Bayes 454 800000 13.3 13.5 13.3 13.5 
    weightedSample AI.Probability.Bayes 447 100000 26.7 23.9 33.3 25.3 
    bnProb   AI.Probability.Bayes 453 100000 0.0 0.0  0.0 0.0 
    bnCond   AI.Probability.Bayes 452 100000 0.0 0.2  0.0 0.2 
    bnVals   AI.Probability.Bayes 450 100000 0.0 0.3  6.7 0.5 
     bnParents  AI.Probability.Bayes 451 100000 6.7 0.2  6.7 0.2 
    bnSubRef   AI.Probability.Bayes 449 200000 0.0 0.7  0.0 0.7

這是一個堆配置文件。我不知道爲什麼它聲稱運行時間是1.8秒 - 這個運行大約需要6秒。

enter image description here

誰能幫我解釋Profiler的輸出 - 即以標識瓶頸，並提供如何加快速度的建議？

來源

2012-07-21 Chris Taylor

嘗試使用'replicateM n'而不是'序列。replicate n' – dflemstr 2012-07-21 18:10:37

運行時間沒有變化 - 可能並不奇怪，因爲'replicateM n' [定義爲]（http://www.haskell.org/ghc/docs/latest/html/libraries/base/src /Control-Monad.html#replicateM）'序列。複製n'。 – 2012-07-21 18:15:31

'length xs'總是'n'，不是嗎？所以用'n'替換'（length xs）'，然後'xs'不必在任何時候完全駐留在內存中。 – dave4420 2012-07-21 18:29:28

通過在likelihoodWeighting中合併使用foldM的JohnL's suggestion已經實現了巨大的改進。這使內存使用量減少了大約10倍，並將GC時間大幅減少到幾乎或幾乎可以忽略不計。

與電流源收率

probabilityIO    AI.Util.Util   26.1 42.4 413 290400000 
weightedSample.go   AI.Probability.Bayes 16.1 19.1 255 131200080 
bnParents     AI.Probability.Bayes 10.8 1.2 171 8000384 
bnVals      AI.Probability.Bayes 10.4 7.8 164 53603072 
bnCond      AI.Probability.Bayes 7.9 1.2 125 8000384 
ndSubRef     AI.Util.Array   4.8 9.2  76 63204112 
bnSubRef     AI.Probability.Bayes 4.7 8.1  75 55203072 
likelihoodWeighting.func AI.Probability.Bayes 3.3 2.8  53 19195128 
%!       AI.Util.Util   3.3 0.5  53 3200000 
bnProb      AI.Probability.Bayes 2.5 0.0  40  16 
bnProb.p     AI.Probability.Bayes 2.5 3.5  40 24001152 
likelihoodWeighting  AI.Probability.Bayes 2.5 2.9  39 20000264 
likelihoodWeighting.func.x AI.Probability.Bayes 2.3 0.2  37 1600000

和13MB內存使用情況的報告-s運行甲紋，〜5MB最大駐留。這已經不算太糟糕。

儘管如此，仍然有一些我們可以改進的地方。首先，一個相對次要的事情，在宏偉計劃，AI.UTIl.Array.ndSubRef：

ndSubRef :: [Int] -> Int 
ndSubRef ns = sum $ zipWith (*) (reverse ns) (map (2^) [0..])

倒車列表和映射(2^)在另一個列表是低效的，更好的是

ndSubRef = L.foldl' (\a d -> 2*a + d) 0

它不需要保留整個列表在內存中（可能不是什麼大事，因爲列表會很短），因爲它反轉了它，並且不需要分配第二個列表。在分配的減少是明顯的，約爲10％，而該部分可測量的運行速度會快，

ndSubRef     AI.Util.Array   1.7 1.3  24 8000384

在修改運行的輪廓

，但由於它只需要的總時間的一小部分，總體影響小。在weightedSample和likelihoodWeighting中有可能會炸的更大的魚。

讓我們增加一點嚴格的weightedSample，看看如何改變的事情：

weightedSample :: Ord e => BayesNet e -> [(e,Bool)] -> IO (Map e Bool, Prob) 
weightedSample bn fixed = 
    go 1.0 (M.fromList fixed) (bnVars bn) 
    where 
     go w assignment []  = return (assignment, w) 
     go w assignment (v:vs) = if v `elem` vars 
      then 
       let w' = w * bnProb bn assignment (v, fixed %! v) 
       in go w' assignment vs 
      else do 
       let p = bnProb bn assignment (v,True) 
       x <- probabilityIO p 
       go w (M.insert v x assignment) vs 

     vars = map fst fixed

的go重量參數從來不強迫，也不是賦值參數，因此，他們可以建立的thunk。讓我們將它傳遞給probabilityIO之前啓用{-# LANGUAGE BangPatterns #-}和力更新立即生效，還評價p：

go w assignment (v:vs) = if v `elem` vars 
    then 
     let !w' = w * bnProb bn assignment (v, fixed %! v) 
     in go w' assignment vs 
    else do 
     let !p = bnProb bn assignment (v,True) 
     x <- probabilityIO p 
     let !assignment' = M.insert v x assignment 
     go w assignment' vs

這使在分配的進一步降低（〜9％）和小加速（〜％13％），但總內存使用量和最大居住量並沒有太大變化。

我看到沒有別的明顯的變化出現，所以讓我們來看看likelihoodWeighting：

func m _ = do 
    (a, w) <- weightedSample bn fixed 
    let x = a ! e 
    return $! x `seq` w `seq` M.adjust (+w) x m

在最後一行，首先，w已經在weightedSample評估了，所以我們並不需要seq它在這裏，需要使用密鑰x來評估更新後的地圖，因此這也不是必需的。該行的壞消息是M.adjust。 adjust沒有辦法強制更新函數的結果，以便在地圖的值中生成thunk。您可以通過查找修改值和強制，強制的thunk的評價，但Data.Map這裏提供了一個更方便的方法，因爲在該地圖更新是保證關鍵是存在，insertWith'：

func !m _ = do 
    (a, w) <- weightedSample bn fixed 
    let x = a ! e 
    return (M.insertWith' (+) x w m)

（注意：在這裏，GHC使用m的爆炸模式比使用return $! ...更好地優化）。稍微減少了總的分配和無法顯改變運行時間，但對使用的總內存和最大居住有很大的影響：

934,566,488 bytes allocated in the heap 
    1,441,744 bytes copied during GC 
     68,112 bytes maximum residency (1 sample(s)) 
     23,272 bytes maximum slop 
      1 MB total memory in use (0 MB lost due to fragmentation)

在運行時間最大的改進將不得不將避免randomIO ，使用的StdGen非常慢。

我很驚訝bn*函數花了多少時間，但沒有看到任何明顯的低效率。

來源

2012-07-22 15:08:27

謝謝丹尼爾，這真的很有用。我已經提出了您的建議更改，並且會考慮您擺脫'randomIO'的想法。我覺得我現在更接近於寫一個'真正的'Haskell程序了...... – 2012-07-23 07:10:33

你讓我的積極投入比我更多的時間。 – 2012-07-25 09:08:22

我把一個非常基本的例子放在這裏：http://hpaste.org/71919。我不確定它是否和你的例子一樣，只是一個似乎很有用的非常小的東西。

與-prof和-fprof-auto編譯和100000次迭代運行產生的分析輸出以下頭（原諒我的行號）：

8 COST CENTRE     MODULE    %time %alloc 
    9 
10 sample      AI.Util.ProbDist  31.5 36.6 
11 bnParents      AI.Probability.Bayes 23.2 0.0 
12 bnRank      AI.Probability.Bayes 10.7 23.7 
13 weightedSample.go    AI.Probability.Bayes 9.6 13.4 
14 bnVars      AI.Probability.Bayes 8.6 16.2 
15 likelihoodWeighting   AI.Probability.Bayes 3.8 4.2 
16 likelihoodWeighting.getSample AI.Probability.Bayes 2.1 0.7 
17 sample.cumulative    AI.Util.ProbDist  1.7 2.1 
18 bnCond      AI.Probability.Bayes 1.6 0.0 
19 bnRank.ps      AI.Probability.Bayes 1.1 0.0

而且這裏的彙總統計：

1,433,944,752 bytes allocated in the heap 
1,016,435,800 bytes copied during GC 
    176,719,648 bytes maximum residency (11 sample(s)) 
    1,900,232 bytes maximum slop 
      400 MB total memory in use (0 MB lost due to fragmentation) 

INIT time 0.00s ( 0.00s elapsed) 
MUT  time 1.40s ( 1.41s elapsed) 
GC  time 1.08s ( 1.24s elapsed) 
Total time 2.47s ( 2.65s elapsed) 

%GC  time  43.6% (46.8% elapsed) 

Alloc rate 1,026,674,336 bytes per MUT second 

Productivity 56.4% of total user, 52.6% of total elapsed

請注意，剖析器將其手指指向sample。我用$!被迫在該函數的return，這裏有一些彙總統計算賬：

1,776,908,816 bytes allocated in the heap 
    165,232,656 bytes copied during GC 
    34,963,136 bytes maximum residency (7 sample(s)) 
     483,192 bytes maximum slop 
      68 MB total memory in use (0 MB lost due to fragmentation) 

INIT time 0.00s ( 0.00s elapsed) 
MUT  time 2.42s ( 2.44s elapsed) 
GC  time 0.21s ( 0.23s elapsed) 
Total time 2.63s ( 2.68s elapsed) 

%GC  time  7.9% (8.8% elapsed) 

Alloc rate 733,248,745 bytes per MUT second 

Productivity 92.1% of total user, 90.4% of total elapsed

在大部分GC方面更有效率，但不是在時間太大的改變。您可能會繼續以這種配置文件/調整方式進行迭代，以針對瓶頸並提供更好的性能。

來源

2012-07-22 00:12:19 jtobin

非常感謝！坦率地說，我印象深刻的是，你讀了足夠的可怕代碼來研究如何組合一個例子... – 2012-07-22 00:21:10

快速問題 - 我正在用'-prof -auto-all'編譯我的測試腳本，但我只能得到如果我添加明確的'{ - ＃SCC＃ - }'註釋，按成本中心細分。你是如何自動獲得它們的？我嘗試用'-prof -fprof-auto'編譯，但GCH不能識別第二個標誌。 – 2012-07-22 01:07:51

您必須使用舊的GHC - 我正在使用7.4.1，它支持[這些分析選項]（http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/prof-compiler -options.html）。 7.0.2支持[這些]（http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/prof-compiler-options.html）;它看起來像'-auto-all'忽略嵌套函數。 – jtobin 2012-07-22 02:03:32

我覺得你的初步診斷是正確的，我從來沒有見過一個分析報告一次記憶效應在踢的是非常有用的。

的問題是，你的sequence並再次遍歷列表兩次，一次爲sum。在Haskell中，大列表的多列表遍歷對性能來說真的很糟糕。解決方案通常使用某種類型的摺疊，例如foldM。您sampleMean功能可以寫成

{-# LANGUAGE BangPatterns #-} 

sampleMean2 :: MonadRandom m => Int -> m Float -> m Float 
sampleMean2 n dist = foldM (\(!a) mb -> liftM (+a) mb) 0 $ replicate n dist

例如，遍歷列表一次。

你也可以用likelihoodWeighting做同樣的事情。爲了防止thunk，重要的是要確保摺疊函數中的累加器具有適當的嚴格性。

來源

2012-07-22 02:23:40

您的示例不會檢查。 step函數應該看起來更像'liftM。（+）'。它也可能更嚴格一些;使用'randomRIO'進行測試往往會導致堆棧溢出，數量可能更大，可能類似於'\ a mb - > mb >> = \ b - > return $！ a + b'？ – Vitus 2012-07-22 02:40:16

謝謝，我會重新編寫'likelihoodWeighting'。 – 2012-07-22 07:41:56

@Vitus - 謝謝，修正。我通常更喜歡使用爆炸模式來增加嚴格性，但是您的建議可能完全相同。 – 2012-07-22 11:08:06

我無法消化這些配置文件，但我已經得到了我的屁股踢之前，因爲在Hackage上MonadRandom是嚴格。創建MonadRandom的懶惰版本使我的內存問題消失。

我的同事尚未獲得發佈代碼的權限，但我已將在線放在pastebin。或者，如果你想看到一些解釋完全懶惰隨機搜索的摘錄，包括隨機計算的無限列表，請查看Experience Report: Haskell in Computational Biology。

來源

2012-07-22 18:42:11

+1用於提出一個不太嚴格的實現，而不是將其更嚴格的常規建議 – 2012-07-22 19:05:20

謝謝，這是一個有趣的想法。 Daniel Fischer建議我擺脫我對'randomIO'的調用，所以我認爲讓我的採樣函數返回一個'Rand a'，然後用'evalRandIO'調用它們是一條路。 – 2012-07-23 07:14:10

@Chris：我把我們的論文放在網上;它有一些使用Rand a計算的例子。我們只在頂層調用'evalRand'，或者在分支並行計算列表時調用。 – 2012-07-23 16:34:09

分析一個Haskell程序

回答

相關問題