Why pca on my matrix gives the first number in latent matrix greater than one?

Question

Penny13 2019-3-5

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/448414-why-pca-on-my-matrix-gives-the-first-number-in-latent-matrix-greater-than-one

评论： Penny13 2019-3-11

I have a 626284 by 26 matrix which is all zeros and ones. I did [coeff,score,latent] = pca(X) on my matrix but latent gave me the following numbers:

1.47069819212040

0.338544895320084

0.225716863688052

0.188056189419163

0.157949433440297

0.126385063251976

0.0906964951134501

0.0773105845697984

0.0738595589018172

0.0659590250255644

0.0616215954476751

0.0537688669401442

0.0262686347674844

0.0160550157883815

0.0112744279903577

0.0105353514551859

6.11095771880279e-33

6.03879225801973e-33

5.96730010116445e-33

So what could be the reason?

Thank you for your guidance.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

David Goodmanson 2019-3-6

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/448414-why-pca-on-my-matrix-gives-the-first-number-in-latent-matrix-greater-than-one#answer_364010

编辑：David Goodmanson 2019-3-6

在 MATLAB Online 中打开

Hi Penny,

Is there a reason that you think that a matrix of all ones and zeros can't have a latent value greater than 1? Here is a counterexample:

n = 50;
m = 20;
A = [ones(n,m);triu(ones(m,m));zeros(n,m)];
[coeff,score,latent] = pca(A);
rA =rank(A)
% results
latent = 
4864
2955
0826
0379
0218
0143
0102
0077
0061
0050
0042
0036
0032
0029
0026
0025
0023
0022
0022
0021
 rA = 20    
    

The triu matrix was inserted so that every column is linearly independent, which sidesteps a potentially artificial trick situation where a lot of columns are identical. Matrix A has full rank of 20.

pca starts out by taking the mean of each column, so the idea here was to make the excursions from the mean as large as possible. WIth only 1 and 0 avialable, this means creating columns that are half ones and half zeros (or close to it). After that, constructing a bunch of columns that are nearly parallel puts most of the deviation along a single axis.

3 个评论
显示 1更早的评论隐藏 1更早的评论

David Goodmanson 2019-3-8

Hi Penny,

There is plenty of information out there, starting with 'help pca' and then wikipedia, but in brief: yes the latent matrix is as you say, but there is no reason the variances need to be small. Variances are just the average value of a sum of squares of deviations from the mean, and they can be large. If you take a set of data and multiply all the values by 10, the variance goes up by a factor of 100. It's not like the correlation coefficient, which is normalized and comes out between +-1.

The latent variable is as you say. Coefficients are components of the principal axes, which are unit vectors. So the sum of squares of each column in the component matrix = 1. Scores are the variances for each measurement (row) along the principal axes.

Penny13 2019-3-11