在Python中生成相關數據（3.3）

在R中有一個函數（cm.rnorm.cor，來自程序包CreditMetrics），該函數用於創建相關數據的樣本數量，變量數量和相關矩陣。在Python中生成相關數據（3.3）

Python中是否有等價物？

2013-04-15 PascalVKooten

對不起，我的壞，在Python 3.3。 – PascalVKooten

Whaaat ...支持最近添加了！謝謝你提醒我。 – PascalVKooten

@Dualinity，在一個不太幽默的基調上，除了來自Blender的很多軟件包之外，我真的建議你嘗試一下Python（X，Y）。它是用於科學開發的Python包集合+ IPython + Great IDE，稱爲spyder。 http://code.google.com/p/pythonxy/ – Oz123

numpy.random.multivariate_normal是你想要的功能。

實施例：

import numpy as np 
import matplotlib.pyplot as plt 


num_samples = 400 

# The desired mean values of the sample. 
mu = np.array([5.0, 0.0, 10.0]) 

# The desired covariance matrix. 
r = np.array([ 
     [ 3.40, -2.75, -2.00], 
     [ -2.75, 5.50, 1.50], 
     [ -2.00, 1.50, 1.25] 
    ]) 

# Generate the random samples. 
y = np.random.multivariate_normal(mu, r, size=num_samples) 


# Plot various projections of the samples. 
plt.subplot(2,2,1) 
plt.plot(y[:,0], y[:,1], 'b.') 
plt.plot(mu[0], mu[1], 'ro') 
plt.ylabel('y[1]') 
plt.axis('equal') 
plt.grid(True) 

plt.subplot(2,2,3) 
plt.plot(y[:,0], y[:,2], 'b.') 
plt.plot(mu[0], mu[2], 'ro') 
plt.xlabel('y[0]') 
plt.ylabel('y[2]') 
plt.axis('equal') 
plt.grid(True) 

plt.subplot(2,2,4) 
plt.plot(y[:,1], y[:,2], 'b.') 
plt.plot(mu[1], mu[2], 'ro') 
plt.xlabel('y[1]') 
plt.axis('equal') 
plt.grid(True) 

plt.show()

結果：

enter image description here

參見CorrelatedRandomSamples在SciPy的食譜。

來源

2013-04-15 23:20:07

如果喬萊斯基-分解協方差矩陣C爲L L^T，併產生獨立隨機向量x，然後Lx將與協方差 C隨機向量。

import numpy as np 
import matplotlib.pyplot as plt 
linalg = np.linalg 
np.random.seed(1) 

num_samples = 1000 
num_variables = 2 
cov = [[0.3, 0.2], [0.2, 0.2]] 

L = linalg.cholesky(cov) 
# print(L.shape) 
# (2, 2) 
uncorrelated = np.random.standard_normal((num_variables, num_samples)) 
mean = [1, 1] 
correlated = np.dot(L, uncorrelated) + np.array(mean).reshape(2, 1) 
# print(correlated.shape) 
# (2, 1000) 
plt.scatter(correlated[0, :], correlated[1, :], c='green') 
plt.show()

enter image description here

參考：見Cholesky decomposition

如果你想生成兩個系列，X和Y，特別(Pearson) correlation coefficient（如0.2）：

rho = cov(X,Y)/sqrt(var(X)*var(Y))

你可以選擇的協方差矩陣是

cov = [[1, 0.2], 
     [0.2, 1]]

這使得cov(X,Y) = 0.2，和方差，var(X)和var(Y)都等於1，所以rho將等於0.2。

例如，下面我們生成1000對相關係列，X和Y,1000次。然後我們繪製的相關係數的柱狀圖：

import numpy as np 
import matplotlib.pyplot as plt 
import scipy.stats as stats 
linalg = np.linalg 
np.random.seed(1) 

num_samples = 1000 
num_variables = 2 
cov = [[1.0, 0.2], [0.2, 1.0]] 

L = linalg.cholesky(cov) 

rhos = [] 
for i in range(1000): 
    uncorrelated = np.random.standard_normal((num_variables, num_samples)) 
    correlated = np.dot(L, uncorrelated) 
    X, Y = correlated 
    rho, pval = stats.pearsonr(X, Y) 
    rhos.append(rho) 

plt.hist(rhos) 
plt.show()

enter image description here

正如你所看到的，相關係數一般都接近0.2，但對於任何給定的樣本，該相關性將最有可能不是0.2完全相同。

來源

2013-04-15 22:14:16 unutbu

您是否知道如何獲取數據的準確相關性，比如0.2（與小容差一樣）？ – PascalVKooten

或者這確切已經？ – PascalVKooten

什麼'numpy.random.multivariate_normal'在引擎蓋下做什麼？因爲我將前者與cholesky方法進行了比較，發現後者顯着更快，特別是對於較大的尺寸數據（如幾千）。 cholesky方法僅適用於某些特定類型的協方差矩陣嗎？我的cov矩陣只是對角線，或者非常稀疏。 – Jason

在Python中生成相關數據（3.3）

回答

相關問題