我遇到了sklearn.mixture.dpgmm問題。主要問題是它沒有爲合成數據(2個分離的2D高斯)返回正確的協方差,它真的不應該有問題。特別是,當我做dpgmm._get_covars()時,無論輸入數據分佈如何,協方差矩陣的對角線元素總是恰好爲1.0太大。這看起來像一個錯誤,因爲gmm完美地工作(當限制到已知確切數量的組時)Sklearn.mixture.dpgmm無法正常工作
另一個問題是dpgmm.weights_沒有意義,它們總合爲一,但數值看起來毫無意義。
有沒有人有解決這個問題,或看到明顯錯誤與我的例子?
這裏是我運行的確切腳本:
import itertools
import numpy as np
from scipy import linalg
import matplotlib.pyplot as plt
import matplotlib as mpl
import pdb
from sklearn import mixture
# Generate 2D random sample, two gaussians each with 10000 points
rsamp1 = np.random.multivariate_normal(np.array([5.0,5.0]),np.array([[1.0,-0.2],[-0.2,1.0]]),10000)
rsamp2 = np.random.multivariate_normal(np.array([0.0,0.0]),np.array([[0.2,-0.0],[-0.0,3.0]]),10000)
X = np.concatenate((rsamp1,rsamp2),axis=0)
# Fit a mixture of Gaussians with EM using 2
gmm = mixture.GMM(n_components=2, covariance_type='full',n_iter=10000)
gmm.fit(X)
# Fit a Dirichlet process mixture of Gaussians using 10 components
dpgmm = mixture.DPGMM(n_components=10, covariance_type='full',min_covar=0.5,tol=0.00001,n_iter = 1000000)
dpgmm.fit(X)
print("Groups With data in them")
print(np.unique(dpgmm.predict(X)))
##print the input and output covars as example, should be very similar
correct_c0 = np.array([[1.0,-0.2],[-0.2,1.0]])
print "Input covar"
print correct_c0
covars = dpgmm._get_covars()
c0 = np.round(covars[0],decimals=1)
print "Output Covar"
print c0
print("Output Variances Too Big by 1.0")