Hello everyone.
I have a matrix that has 300 rows(samples) and 5000 columns(features).
I need to reduce the number of columns for classification.
As far as I know for using pca() function the number of samples should be greater than the number of features.
So I try to use Singular Value Decomposition function with below codes.
[U, Sig, V]=svd(X);
sv=diag(Sig)
figure;
sv=sv/sum(sv);
stairs(cumsum(sv));
xlabel('singular values');
ylabel('cumulative sum');
I have two questions.
1) As i understand from the above figure i have to take approximately 250 singular values that it counts for 95% of my data.
So should I take first 250 singular values for creating a new data for classification?
How can i see the variance of each principal components like in pca() functions explained matrix to decide how many of them should i use?
2) After defining the number of principal components, I need to create a new matrix for classification.
Can I do this with below code? (for example with first two principal components)
new_matrix_for_classification = X*(V:,1:2);
Thanks in advance.