2013-02-17 57 views
4

如何生成由N = 1002維高斯分佈畫出2維樣本x = (x1,x2)T ∈ R2的數據組,平均生成由N = 100 2維樣本

µ = (1,1)T 
的數據集

和協方差矩陣

Σ = (0.3 0.2 
    0.2 0.2) 

有人告訴我,你可以使用Matlab的功能randn,但不知道如何在Python中實現它?

+0

你嘗試過'numpy'嗎? – ATOzTOA 2013-02-17 10:58:17

+0

Python模塊'numpy','scipy'和'MDP'實現了大量用Matlab可以完成的工作。 – Will 2013-02-17 11:02:03

回答

6

只是要詳細說明@ EamonNerbonne的答案:下面使用協方差矩陣的Cholesky decomposition從不相關的正態分佈隨機變量生成相關變量。

import numpy as np 
import matplotlib.pyplot as plt 
linalg = np.linalg 

N = 1000 
mean = [1,1] 
cov = [[0.3, 0.2],[0.2, 0.2]] 
data = np.random.multivariate_normal(mean, cov, N) 
L = linalg.cholesky(cov) 
# print(L.shape) 
# (2, 2) 
uncorrelated = np.random.standard_normal((2,N)) 
data2 = np.dot(L,uncorrelated) + np.array(mean).reshape(2,1) 
# print(data2.shape) 
# (2, 1000) 
plt.scatter(data2[0,:], data2[1,:], c='green')  
plt.scatter(data[:,0], data[:,1], c='yellow') 
plt.show() 

enter image description here

通過np.random.multivariate_normal產生黃色點。綠點是通過將正態分佈點乘以喬列斯基分解矩陣L而生成的。

+0

是的,這正是我想到的:-) - 很好的例子! – 2013-02-17 14:10:17

+0

當我嘗試運行它時,出現此錯誤:L = linalg.cholesky(cov) NameError:名稱'linalg'未定義 – pythonnewbie 2013-02-17 20:32:35

+0

哎呀。我忘了加入'linalg = np.linalg'。 (帖子已被更正。) – unutbu 2013-02-17 20:49:01

5

您正在尋找numpy.random.multivariate_normal

代碼

>>> import numpy 
>>> print numpy.random.multivariate_normal([1,1], [[0.3, 0.2],[0.2, 0.2]], 100) 
[[ 0.02999043 0.09590078] 
[ 1.35743021 1.08199363] 
[ 1.15721179 0.87750625] 
[ 0.96879114 0.94503228] 
[ 1.23989167 1.13473083] 
[ 1.55917608 0.81530847] 
[ 0.89985651 0.7071519 ] 
[ 0.37494324 0.739433 ] 
[ 1.45121732 1.17168444] 
[ 0.69680785 1.2727178 ] 
[ 0.35600769 0.46569276] 
[ 2.14187488 1.8758589 ] 
[ 1.59276393 1.54971412] 
[ 1.71227009 1.63429704] 
[ 1.05013136 1.1669758 ] 
[ 1.34344004 1.37369725] 
[ 1.82975724 1.49866636] 
[ 0.80553877 1.26753018] 
[ 1.74331784 1.27211784] 
[ 1.23044292 1.18110192] 
[ 1.07675493 1.05940509] 
[ 0.15495771 0.64536509] 
[ 0.77409745 1.0174171 ] 
[ 1.20062726 1.3870498 ] 
[ 0.39619719 0.77919884] 
[ 0.87209168 1.00248145] 
[ 1.32273339 1.54428262] 
[ 2.11848535 1.44338789] 
[ 1.45226461 1.42061198] 
[ 0.33775737 0.24968543] 
[ 1.06982557 0.64674411] 
[ 0.92113229 1.0583153 ] 
[ 0.54987592 0.73198037] 
[ 1.06559727 0.77891362] 
[ 0.84371805 0.72957046] 
[ 1.83614557 1.40582746] 
[ 0.53146009 0.72294094] 
[ 0.98927818 0.73732053] 
[ 1.03984002 0.89426628] 
[ 0.38142362 0.32471126] 
[ 1.44464929 1.15407227] 
[-0.22601279 0.21045592] 
[-0.01995875 0.45051782] 
[ 0.58779449 0.44486237] 
[ 1.31335981 0.92875936] 
[ 0.42200098 0.6942829 ] 
[ 0.10714426 0.11083002] 
[ 1.44997839 1.19052704] 
[ 0.78630506 0.45877582] 
[ 1.63432202 1.95066539] 
[ 0.56680926 0.92203111] 
[ 0.08841491 0.62890576] 
[ 1.4703602 1.4924649 ] 
[ 1.01118864 1.44749407] 
[ 1.19936276 1.02534702] 
[ 0.67893239 0.8482461 ] 
[ 0.71537211 0.53279103] 
[ 1.08031573 1.00779064] 
[ 0.66412568 0.57121041] 
[ 0.96098528 0.72318386] 
[ 0.7690299 0.76058713] 
[ 0.77466896 0.77559282] 
[ 0.47906664 0.58602633] 
[ 0.52481326 0.78486453] 
[-0.40240438 0.17374116] 
[ 0.75730444 0.22365892] 
[ 0.67811008 1.17730408] 
[ 1.62245699 1.71775386] 
[ 1.12317847 1.04252136] 
[-0.06461117 0.23557416] 
[ 0.46299482 0.51585414] 
[ 0.88125676 1.23284201] 
[ 0.57920534 0.63765861] 
[ 0.88239858 1.32092112] 
[ 0.63500551 0.94788141] 
[ 1.76588148 1.63856465] 
[ 0.65026599 0.6899672 ] 
[ 0.06854287 0.29712499] 
[ 0.61575737 0.87526625] 
[ 0.30057552 0.54475194] 
[ 0.66578769 0.21034844] 
[ 0.94670438 0.7699764 ] 
[ 0.39870371 0.91681577] 
[ 1.37531351 1.62337899] 
[ 1.92350877 1.34382017] 
[ 0.56631877 0.77456137] 
[ 1.18702642 0.63700271] 
[ 0.74002244 1.04535471] 
[ 0.3272063 0.75097037] 
[ 1.57583435 1.55809705] 
[ 0.44325124 0.39620769] 
[ 0.59762516 0.58304621] 
[ 0.72253698 0.68302097] 
[ 0.93459597 1.01101948] 
[ 0.50139577 0.52500942] 
[ 0.84696441 0.68679341] 
[ 0.63483432 0.22205385] 
[ 1.43642478 1.34724612] 
[ 1.58663111 1.49941374] 
[ 0.73832806 0.95690866]] 
>>> 
2

雖然numpy的具有方便實用的功能,可以隨時「重新調整」多個獨立的正態分佈變量來匹配您給出的協方差矩陣。因此,如果您可以生成每個元素正態分佈的列向量x(或按矩陣分組的多個向量),並且您按矩陣M進行縮放,則結果將具有協變性M M^T。相反,如果你將協方差C分解成M M^T的形式,那麼即使沒有numpy提供的效用函數(只是將你正常分佈的一組矢量乘以M),生成這樣的分佈也非常簡單。

這也許不是你直接尋找答案,但要記住例如,它是非常有用的:

  • 如果你發現自己的縮放隨機生成的結果,你可以代替如果您需要將代碼移植到不直接支持這種實用程序方法的庫,那麼可以將縮放與您的初始協方差結合起來,這很容易實現。