How to check if data is normally distributed

Question

Nancy 2012-8-7

2
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/45477-how-to-check-if-data-is-normally-distributed

回答： Sarutahiko 2013-12-11

Hi all,

I want to run a f-test on two samples to see if their variances are independent. Wikipedia says that the f test is sensitive to non normality of sample (<http://en.wikipedia.org/wiki/F-test)>. How can I check if my samples are normally distributed or not.

I read some forums which said I can use kstest and lillietest. When can I use either? I get an answer h=0. Does that mean my data is normally distributed?

Thanks. Nancy

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Tom Lane 2012-8-7

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/45477-how-to-check-if-data-is-normally-distributed#answer_55625

The functions you mention return H=0 when a test cannot reject the hypothesis of a normal distribution. They can't prove that the distribution is normal, but they don't find much evidence against that hypothesis.

The VARTESTN function has an option that is robust to non-normal distributions.

2 个评论
显示无隐藏无

Nancy 2012-8-7

Thanks Sean. How can I use vartesn for a 2 sample variance test. The input is just X. My samples are of unequal sizes.

Tom Lane 2012-8-9

在 MATLAB Online 中打开

Suppose you would normally do

x1 = randn(20,1); x2 = 1.5*randn(25,1);
[h,p] = vartest2(x1,x2)

Then you can do something like this instead:

grp = [ones(size(x1)); 2*ones(size(x2))];
vartestn([x1;x2], grp)

I believe the two-sample vartestn test is not identical to the vartest2 test, but the p-values are likely to be similar. Then you can add options to do a robust test using vartestn.

请先登录，再进行评论。

Answer 2

Sean 2012-8-7

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/45477-how-to-check-if-data-is-normally-distributed#answer_55629

在 MATLAB Online 中打开

Hello Nancy,

You cannot tell from only 2 samples whether they are normally distributed or not. If you have a larger sample set and you are only testing them in pairs, then you could use the larger sample set to test for a particular distribution.

For example: (simple q-q plot)

data= randn(100); %generate random normally distributed 100x100 matrix
ref1= randn(100); %generate random normally distributed 100x100 matrix
ref2= rand(100); %generate random uniformly distributed 100x100 matrix
x=sort(data(:));
y1=sort(ref1(:));
y2=sort(ref2(:));
subplot(1,2,1); plot(x,y1); 
subplot(1,2,2); plot(x,y2);

The first plot should be a straight line (indicating that the data distribution matches the reference distribution. The second plot isn't a straight line, indicating that the distributions do not match.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Sean 2012-8-7

编辑：Sean 2012-8-7

在 MATLAB Online 中打开

The fewer points you have available, the less definitive the test is. If you run the previous set of sample code for a smaller set of data and reference points you should see what I mean. (e.g. The shape of the lines, is less well defined and more affected by random noise with a smaller sample set.)

Regarding a test for independence... you might try scatter plotting them with respect to each other.

For example:

data1=randn([100,1]);
data2=(data1.^2-3*data1+5)+0.01*randn([100,1]); 
%data2 is a function of data1 + noise
ref=randn([100,1]);
subplot(1,2,1);scatter(data1(:),ref(:));
subplot(1,2,2);scatter(data1(:),data2(:));

As you can see, the independent reference variable is all across the plot, but the relationship between the two data samples is clearly evident.

Another way to look at this would be:

subplot(1,2,1);plot(conv(data1,data2))
subplot(1,2,2);plot(conv(data1,ref))

Note: I have not vetted/proved these methods in a rigorous way, so I would use it with the understanding that it MAY reveal some dependencies, but isn't guaranteed, especially if there is a real but weak relationship or a time delayed relationship.

Nancy 2012-8-7

The data samples you have given have equal sizes. What would I do if there are unequal sizes. I need to compare the variances across a lot of samples. I am wondering if there was a test like the t test for doing so. If I submit a report, I would just to write in the p values.

Thanks for your help Sean.

请先登录，再进行评论。

Answer 3

Sarutahiko 2013-12-11

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/45477-how-to-check-if-data-is-normally-distributed#answer_118040

Assuming you agree with the Anderson-Darling test for Normality, I'd just use Matlab's prebuilt function for that. It is http://www.mathworks.com/help/stats/adtest.html

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to check if data is normally distributed

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（2 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

How to check if data is normally distributed

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（2 个）

3 个评论 显示 1更早的评论隐藏 1更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无

3 个评论
显示 1更早的评论隐藏 1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论