Applying Classical Multidimensional Scaling and Normalization on new data

9 次查看(过去 30 天)
Dear Matlab-Community,
I would be happy if someone gives me a hint for two following questions in Machine Learning - related application of Classical Multidimensional Scaling in Matlab:
1.) It is recommended (but in my experience rarely done) to apply feature extraction (and hence the feature projection methods as Classical Multidimensional or PCA ) separately on the training, validation and test set in order to get a more realistic model performance estimation during the model validation. Furthermore, I would get the same question if I would apply my models after model deployment in real environment with new data. How can I the classicla multidimensional scaling separately in order to use the with the training data developed model on the validation set etc.? Do I firstly project my training data on the lower dimension and later provide the determined dimensionality to the Classical multidimensional scaling of validation/test/new data?
2.) Is the workflow right: data normalization --> classical multidimensional scaling --> again data normalization on the reduced data?
I have provided an example code.
I am appreciating every hint! Thank you very much!
Regards,
Denys
%% Using of Hold-Out Validation with training, validation and test-set
%% Classical Multidimensional Scaling Training data
DataTrain = normalize(DataTrain); % normalization
D = pdist(DataTrain,'euclidean'); % pair-wise distances
[Y,e] = cmdscale(D); % carry out classical multidimensional scaling
CumSumEig = cumsum(e./sum(e)); % cumulated sum in order to identify the cumlated relevance
LowDim = find(CumSumEig > 0.95, 1, "first"); % dimensions with cumulated relevance over 95 %
DataTrain = Y(:,1:LowDim); % reduced data
%% Validation
% How to reduce the dimensionality of the validation data, because it is
% recommended to apply feature extraction (and hence the feature projection
% methods) separately on the training, validation and test set?
DataValidation = normalize(DataValidation); % normalization
D = pdist(DataValidation,'euclidean'); % pair-wise distances
[Y,e] = cmdscale(D); % carry out classical multidimensional scaling
DataValidation = Y(:,1:LowDim); % reduced data
% or application to new data
DataNew = normalize(DataNew); % normalization
D = pdist(NewData,'euclidean'); % pair-wise distances
[Y,e] = cmdscale(D); % carry out classical multidimensional scaling
DataNew = Y(:,1:LowDim); % reduced data
% Again Data normalization before making predictions?
DataTrain = normalize(DataTrain);
DataNew = normalize(DataNew);

回答(1 个)

Aditya
Aditya 2023-9-13
编辑:Aditya 2023-9-14
Hey Denys,
I understand that you need hints for the applications of classical multidimensional scaling in MATLAB, here are a few pointers that might help you out.
Yes, it is generally recommended to apply feature extraction methods, such as Classical Multidimensional Scaling (MDS) or Principal Component Analysis (PCA), separately on the training, validation, and test sets.
To use Classical MDS separately on the training, validation, and test sets here is the workflow:
1. Perform feature extraction on the training set: Apply Classical MDS (or any other feature extraction method) on the training data to project it onto a lower-dimensional space. This will reduce the dimensionality of the training data while preserving its essential structure.
2. Determine the dimensionality: After applying Classical MDS on the training set, you will obtain a lower-dimensional representation of the training data. Determine the desired dimensionality based on the amount of variance you want to retain or any other criteria.
3. Project the validation and test sets: Once you have determined the desired dimensionality from the training set, apply the same feature projection to the validation and test sets. This ensures that the same transformation is applied consistently across all datasets.
By following these steps, you will extract features from the training set using Classical MDS, determine the desired dimensionality, and then project the validation and test sets onto the same lower-dimensional space.
It's important to note that when applying feature extraction methods separately on different datasets, you should use the parameters obtained from the training set consistently on the validation, test, or new data. This ensures that the same transformation is applied consistently across all datasets and allows for fair comparison and evaluation of the model's performance.
By applying feature extraction separately on different datasets and using the same projection parameters, you can obtain a more realistic estimation of model performance during validation and ensure consistent feature projection on new data in a real environment.
You may refer to the following Documentation link for more information on Classical Multidimensional Scaling:
Thanks,
Best Regards
Aditya Kaloji
  1 个评论
Denys Romanenko
Denys Romanenko 2023-9-25
Hi Aditya,
thank you very much for your input.
After your reply I am still not really sure whether it is possible what you suggest. The obtained dimensionality after classical mds is not what I need to conduct the same projection on another dataset, does it?
Regards,
Denys

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

产品


版本

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by