PCA operation and its inverse operation on a dataset

11 次查看(过去 30 天)
Was trying the PCA function based on the example in matlab help
load hald % The ingredients data has 13 observations for 4 variables.
coeff = pca(ingredients)
coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
I have a few doubts 1. The observation do we need to pre-process the raw data or can we use it as such? 2. Based on the code, we are doing dimensionality reduction, then how will we be able to get the data with the original structure back(error will be introduced). That is the original data is 13x4 and the coeff size is 4x4. What else are needed by the decoder?
[coeff,score,latent,tsquared,explained,mu] = pca(ingredients)

回答(1 个)

Paras Gupta
Paras Gupta 2024-7-19,6:32
Hello,
It is generally a good practice to pre-process the raw data. Common pre-processing steps include:
If you do do not want to remove missing entries from your data, you can use the Alternating least squares (ALS) algorithm for PCA in matlab which better handles missing values. You can refer the folllowing link on selecting the algorithm for PCA - https://www.mathworks.com/help/stats/pca.html#bth9ibe-Algorithm
[coeff,score,latent,tsquared,explained] = pca(ingredients,'algorithm','als');
When you perform PCA, you are transforming your data into a new coordinate system where the axes (principal components) are ordered by the amount of variance they explain in the data. This allows you to reduce the dimensionality by keeping only the first few principal components.
To reconstruct the data back to its original structure, you can use the principal component scores 'score' and the principal component coefficients 'coeff'. However, if you reduce the dimensionality, some information will be lost, introducing reconstruction error.
The following code shows how reonstruction of the original data can be done:
[coeff, score, latent, tsquared, explained, mu] = pca(ingredients);
% Select the number of principal components to keep
numComponentsToKeep = 2;
% Reduce dimensionality
reducedScore = score(:, 1:numComponentsToKeep);
% Reconstruct the data (rank-k approximation, where k is numComponentsToKeep)
reconstructedData = reducedScore * coeff(:, 1:numComponentsToKeep)' + repmat(mu1,size(ingredients,1),1);
You can also refer to the documentation on "pca" function for more information on the code above - https://www.mathworks.com/help/stats/pca.html
Hope this helps.

类别

Help CenterFile Exchange 中查找有关 Dimensionality Reduction and Feature Extraction 的更多信息

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by