How many dimensions do I need?

Create a script to compute the number of feature dimensions N needed to represent at least 99.9% of the variance in the feature set of the humanactivity dataset using the 'pca' function.
The steps are:
  • Compute eigvals using the 'pca' function
  • Define vector cumulative_percent_variance_permode, which is a vector the same size as eigvals that contains 100 times (to convert fraction to percentage) the cumulative sum of the normalized eigenvalues
  • Define N as the number of eigenvectors needed to capture at least 99.9% of the variation in our dataset D
Script
load humanactivity.mat
D = feat; % [24075 x 60] matrix containing 60 feature measurements from 24075 samples
% compute eigvals
% compute the cumulative_percent_variance_permode vector.
% Define N as the number of eigenvectors needed to capture at least 99.9% of the variation in D.

回答(2 个)

load humact.mat
D = feat; % [24075 x 60] matrix containing 60 feature measurements from 24075 samples
% compute eigvals
[eigvects,~,eigvals] = pca(D);
% compute the cumulative_percent_variance_permode vector.
percvar = 100*eigvals/sum(eigvals);
cumulative_percent_variance_permode = cumsum(percvar);
% Define N as the number of eigenvectors needed to capture at least 99.9% of the variation in D.
%N = length(cumulative_percent_variance_permode (cumulative_percent_variance_permode >= 99.9))
%cumulative_percent_variance_permode
N=5;

1 个评论

How do you got N=5.
since output gives N=56.
can you please explain?

请先登录,再进行评论。

Sam Chak
Sam Chak 2022-6-5
编辑:Sam Chak 2022-6-5

0 个投票

Find the Sample Size N calculation formula in Google and show it here.
Then we maybe able to show how to compute that in MATLAB.
Also consider using the sampsizepwr() function. For more info, read the following:

类别

帮助中心File Exchange 中查找有关 Dimensionality Reduction and Feature Extraction 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by