Hi Zubair,
Your approach to using PCA to compare two datasets is on the right track. However, there are a few things to consider and some additional steps to visualize and quantify the explained variance by each principal component.
Here’s how you can implement these steps in MATLAB:
% Load and preprocess the nondisease data
nondisease = nondisease';
nondisease(isnan(nondisease)) = 0;
% Perform PCA on the nondisease dataset
[coeff1, score1, latent1, ~, explained1] = pca(nondisease);
% Load and preprocess the disease data
disease1 = baseline';
disease1(isnan(disease1)) = 0;
% Perform PCA on the disease dataset
[coeff2, score2, latent2, ~, explained2] = pca(disease1);
% Visualize explained variance for nondisease dataset
figure;
subplot(1, 2, 1);
bar(explained1);
title('Explained Variance for Nondisease Dataset');
xlabel('Principal Component');
ylabel('Variance Explained (%)');
% Visualize explained variance for disease dataset
subplot(1, 2, 2);
bar(explained2);
title('Explained Variance for Disease Dataset');
xlabel('Principal Component');
ylabel('Variance Explained (%)');
% Optionally, plot PC1 vs PC2 for both datasets
figure;
scatter(score1(:, 1), score1(:, 2), 'b', 'DisplayName', 'Nondisease');
hold on;
scatter(score2(:, 1), score2(:, 2), 'r', 'DisplayName', 'Disease');
xlabel('PC1');
ylabel('PC2');
title('PC1 vs PC2');
legend('show');
