2010-07-28 123 views
11

對於1,000,000個觀測值,我觀察到一個離散事件X,對照組爲3次,測試組爲10次。獨立性的Matlab測試

我需要在Matlab中執行卡方檢驗的獨立性測試。這是你會怎麼做,在R:

m <- rbind(c(3, 1000000-3), c(10, 1000000-10)) 
#  [,1] [,2] 
# [1,] 3 999997 
# [2,] 10 999990 
chisq.test(m) 

R函數返回卡方= 2.7692,DF = 1,p值= 0.0961。

我應該使用或創建什麼Matlab函數來做到這一點?

回答

14

這是我自己的實現,我使用:

function [hNull pValue X2] = ChiSquareTest(o, alpha) 
    %# CHISQUARETEST Pearson's Chi-Square test of independence 
    %# 
    %# @param o   The Contignecy Table of the joint frequencies 
    %#      of the two events (attributes) 
    %# @param alpha  Significance level for the test 
    %# @return hNull  hNull = 1: null hypothesis accepted (independent) 
    %#      hNull = 0: null hypothesis rejected (dependent) 
    %# @return pValue The p-value of the test (the prob of obtaining 
    %#      the observed frequencies under hNull) 
    %# @return X2  The value for the chi square statistic 
    %# 

    %# o:  observed frequency 
    %# e:  expected frequency 
    %# dof: degree of freedom 

    [r c] = size(o); 
    dof = (r-1)*(c-1); 

    %# e = (count(A=ai)*count(B=bi))/N 
    e = sum(o,2)*sum(o,1)/sum(o(:)); 

    %# [ sum_r [ sum_c ((o_ij-e_ij)^2/e_ij) ] ] 
    X2 = sum(sum((o-e).^2 ./ e)); 

    %# p-value needed to reject hNull at the significance level with dof 
    pValue = 1 - chi2cdf(X2, dof); 
    hNull = (pValue > alpha); 

    %# X2 value needed to reject hNull at the significance level with dof 
    %#X2table = chi2inv(1-alpha, dof); 
    %#hNull = (X2table > X2); 

end 

並舉例說明:

t = [3 999997 ; 10 999990] 
[hNull pVal X2] = ChiSquareTest(t, 0.05) 

hNull = 
    1 
pVal = 
    0.052203 
X2 = 
     3.7693 

注意,結果是從你的不同,因爲chisq.test執行默認的校正,根據到?chisq.test

正確:邏輯指示是否 在計算2x2表的測試統計量時應用連續性校正 :一半是 從| O - E |差異。


另外,如果你有問題的兩個事件的實際觀察,你可以使用CROSSTAB函數計算列聯表並返回χ2和p值的措施:

X = randi([1 2],[1000 1]); 
Y = randi([1 2],[1000 1]); 
[t X2 pVal] = crosstab(X,Y) 

t = 
    229 247 
    257 267 
X2 = 
    0.087581 
pVal = 
     0.76728 

在R中的等價物將是:

chisq.test(X, Y, correct = FALSE) 

注意:上述兩種(MATLAB)方法都需要統計工具箱

+1

啊,ninja'd。代碼+1! – Jonas 2010-07-28 20:07:38

+0

@Amro,你會如何爲matlab實現'correct = true'? – Elpezmuerto 2010-07-28 20:29:12

+0

以及根據R文檔只需從| OE |中減去一半,所以用下面的代替:'X2 = sum(sum((abs(oe)-0.5)。^ 2 ./ e));'但是你會有要手動檢查此更正僅適用於2x2表格:http://en.wikipedia.org/wiki/Yates%27_correction_for_continuity – Amro 2010-07-28 20:37:33

0

此函數將使用Pearson卡方統計量和似然比統計量以及計算殘差來檢驗獨立性。我知道這可以進一步向量化,但我試圖展示每一步的數學。

function independenceTest(data) 
df = (size(data,1)-1)*(size(data,2)-1); % Mean Degrees of Freedom 
sd = sqrt(2*df);      % Standard Deviation 

u   = nan(size(data)); % Estimated expected frequencies 
p   = nan(size(data)); % Values used to calculate chi-square 
lr  = nan(size(data)); % Values used to calculate likelihood-ratio 
residuals = nan(size(data)); % Residuals 

rowTotals = sum(data,1); 
colTotals = sum(data,2); 
overallTotal = sum(rowTotals); 

%% Calculate estimated expected frequencies 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     u(r,c) = (rowTotals(c) * colTotals(r))/overallTotal; 
    end 
end 

%% Calculate chi-squared statistic 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     p(r,c) = (data(r,c) - u(r,c))^2/u(r,c); 
    end 
end 
chi = sum(sum(p)); % Chi-square statistic 

%% Calculate likelihood-ratio statistic 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     lr(r,c) = data(r,c) * log(data(r,c)/u(r,c)); 
    end 
end 
G = 2 * sum(sum(lr)); % Likelihood-Ratio statisitc 

%% Calculate residuals 
for r=1:1:size(data,1) 
    for c=1:1:size(data,2) 
     numerator = data(r,c) - u(r,c); 
     denominator = sqrt(u(r,c) * (1 - colTotals(r)/overallTotal) * (1 - rowTotals(c)/overallTotal)); 
     residuals(r,c) = numerator/denominator; 
    end 
end 
+0

查看@ Amro的代碼。他沒有循環進行相同的計算,因此更簡潔。 – Jonas 2010-07-28 20:19:57