Should I use PCA to order the data points in order to find the mode of the data points?

Question

Salad Box 2019-2-26

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/447100-should-i-use-pca-to-order-the-data-points-in-order-to-find-the-mode-of-the-data-points

编辑： John D'Errico 2019-2-26

Question 1: I roughly know that PCA is used for reducing dimensions, it is used for reducing the dimension of features not the dimension of data points (observations), right?

Question 2: for example, I have 30 data points randomly created and plotted below. At this moment, each data point is represented with 2 dimensions - x value and y value.

x=100*rand(1,30);
y=100*rand(1,30);
scatter(x,y,'ro','filled')

Now I would like to find a line (shown in black in the picture below) so that all the data points can be projected onto this line. After projection, each data point can be represented with 1 dimension value (lets call this value z) instead of 2 dimensions (x,y) in the original axis plane. See below. I am guessing this is one simple example of using PCA for dimension reduction. However I don't know how to use Matlab to execute this problem.

My requirements:

1) I would like to obtain all the 1-dimensional z values if possible and each of which representing the original data point;

2) Once I project all the data points onto this line, how to clearly find out which point on the line is related to which original data point?

3) In order to find the mode of the points on the projection line, is it just to sort the z values then find the mode of the z values? Once I obtained the mode of the z values, how do I relate that z value to the corresponding parent data point?

At the moment they are all a bit unclear and I would like to seek some help in order to help me fully understand the basics.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

John D'Errico 2019-2-26

I'm not sure you really understand PCA.

Yes, it is true that PCA is not used to reduce the number of data points. That would serve no purpose.

The data you show does not seem to have one dimension you can intelligently reduce to, i.e., one dimension that seems to encode most of the information in this data. Yes, you can arbitrarily use PCA to reduce it to one variable, and still n data points.

You cannot recover the original data point, merely from a point along the line though.

I'm also not sure why you want to find the mode of the z values, thus the projected points along the line. Odds are, since those points are projected, they will be just a bunch of real numbers. So there will be no most frequent value.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

John D'Errico 2019-2-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/447100-should-i-use-pca-to-order-the-data-points-in-order-to-find-the-mode-of-the-data-points#answer_362852

编辑：John D'Errico 2019-2-26

在 MATLAB Online 中打开

Let me try to explain a bit, although, I think you wouldbe best servd by doing some reading online, or atextbook about PCA. I think you are trying to teach yourself about PCA by playing around with some made up data, just playing in MATLAB. At the same time, by playing around with only 2 variables, you are not really understanding what you see.

xy = [1 2] + randn(100,2)*rand(2,2);
plot(xy(:,1),xy(:,2),'o')

So there is clearly a relation between these two variables. In fact, you might decide to reduce that relationship using PCA. You can do the PCA using the function PCA, or you can use SVD.

xybar = mean(xy,1);
[U,S,V] = svd(xy - xybar);
Projline = xybar + linspace(-3,3,100)'*V(:,1)';
hold on
plot(Projline(:,1),Projline(:,2),'-')

So we have the projected subspace, as the green line.

Better still would be to do some reading about PCA. I would suggest the book by Ted Jackson, (actually J.E. Jackson) as a good read.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Should I use PCA to order the data points in order to find the mode of the data points?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Should I use PCA to order the data points in order to find the mode of the data points?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论