In the chi-square test, how to calculate (the correct number of parameters and consequently) the correct number of degrees of freedom, without using the chi2gof function?

10 次查看(过去 30 天)
Question
In the chi-square test, how to calculate (the correct number of parameters and consequently) the correct number of degrees of freedom, without using the chi2gof function?
I have indeed noticed that the number of degrees of freedom was slightly different in one matlab answer and in the chi2gof function....
Option 1: "df = nbins - 1"
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
Population2 = [996, 749, 370, sum(Population(4:8))];
Sample2 = [647, 486, 100, sum(Sample(4:8))];
chi2stat = sum((Sample2-Population2).^2./Population2);
df = length(Population2)-1;
pcrit = .05;
chi2crit = chi2inv(pcrit,df);
h2 = chi2stat > chi2crit;
p2 = 1 - chi2cdf(chi2stat,df);
fprintf('h=%d, p=%.3f df=%d\n',h2,p2,df);
h=1, p=0.000 df=3
Option 2: "df = nbins - 1 - nparams"
"chi2gof compares the value of the test statistic to a chi-square distribution with degrees of freedom equal to nbins - 1 - nparams, where nbins is the number of bins used for the data pooling and nparams is the number of estimated parameters used to determine the expected counts."
bins = 0:5;
obsCounts = [6 16 10 12 4 2];
n = sum(obsCounts);
pd = fitdist(bins','Poisson','Frequency',obsCounts');
expCounts = n * pdf(pd,bins);
[h,p,st] = chi2gof(bins,'Ctrs',bins,...
'Frequency',obsCounts, ...
'Expected',expCounts,...
'NParams',1)
h = 0
p = 0.4654
st = struct with fields:
chi2stat: 2.5550 df: 3 edges: [-0.5000 0.5000 1.5000 2.5000 3.5000 5.5000] O: [6 16 10 12 6] E: [7.0429 13.8041 13.5280 8.8383 6.0284]

回答(1 个)

dpb
dpb 2023-6-21
Although you specified 'Ctrs', bins, chi2gof created only 5 bins because the obsCounts values for the last two bins in the 'Frequency' vector were too small individually. Hence the DOF for the chi-square test statistic turns out to be based on 5-1-1 --> 3 instead of 6-1-1 --> 4 that may have been what you were expecting?
  5 个评论
dpb
dpb 2023-6-22
编辑:dpb 2023-6-23
Not sure what the remaing puzzle is so don't know how to try to add anything that haven't already said.
The correction to the number of DOF based solely on number of (collapsed) bins is simply how many parameters of the distribution used to calculate the expected counts per bin were estimated from the data itself -- IF ("the big if") the theoretical distribution parameter values are based on the input data itself.
If you test against counts from a theoretical distribution that is obtained from other considerations, then you've not estimated any further parameters from the count data itself and nParams=0.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by