Why pca on my matrix gives the first number in latent matrix greater than one?

2 次查看(过去 30 天)
I have a 626284 by 26 matrix which is all zeros and ones. I did [coeff,score,latent] = pca(X) on my matrix but latent gave me the following numbers:
1.47069819212040
0.338544895320084
0.225716863688052
0.188056189419163
0.157949433440297
0.126385063251976
0.0906964951134501
0.0773105845697984
0.0738595589018172
0.0659590250255644
0.0616215954476751
0.0537688669401442
0.0262686347674844
0.0160550157883815
0.0112744279903577
0.0105353514551859
6.11095771880279e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
5.96730010116445e-33
So what could be the reason?
Thank you for your guidance.

采纳的回答

David Goodmanson
David Goodmanson 2019-3-6
编辑:David Goodmanson 2019-3-6
Hi Penny,
Is there a reason that you think that a matrix of all ones and zeros can't have a latent value greater than 1? Here is a counterexample:
n = 50;
m = 20;
A = [ones(n,m);triu(ones(m,m));zeros(n,m)];
[coeff,score,latent] = pca(A);
rA =rank(A)
% results
latent =
4.4864
0.2955
0.0826
0.0379
0.0218
0.0143
0.0102
0.0077
0.0061
0.0050
0.0042
0.0036
0.0032
0.0029
0.0026
0.0025
0.0023
0.0022
0.0022
0.0021
rA = 20
The triu matrix was inserted so that every column is linearly independent, which sidesteps a potentially artificial trick situation where a lot of columns are identical. Matrix A has full rank of 20.
pca starts out by taking the mean of each column, so the idea here was to make the excursions from the mean as large as possible. WIth only 1 and 0 avialable, this means creating columns that are half ones and half zeros (or close to it). After that, constructing a bunch of columns that are nearly parallel puts most of the deviation along a single axis.
  3 个评论
David Goodmanson
David Goodmanson 2019-3-8
Hi Penny,
There is plenty of information out there, starting with 'help pca' and then wikipedia, but in brief: yes the latent matrix is as you say, but there is no reason the variances need to be small. Variances are just the average value of a sum of squares of deviations from the mean, and they can be large. If you take a set of data and multiply all the values by 10, the variance goes up by a factor of 100. It's not like the correlation coefficient, which is normalized and comes out between +-1.
The latent variable is as you say. Coefficients are components of the principal axes, which are unit vectors. So the sum of squares of each column in the component matrix = 1. Scores are the variances for each measurement (row) along the principal axes.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Dimensionality Reduction and Feature Extraction 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by