How to automatically select the number of Latent variables - plsda script
12 次查看(过去 30 天)
显示 更早的评论
Hi,
I have a question. Using PLS_Toolbox to do a plsda model, when I upload the calibration datasets X and Y, the toolbox can automatically select the numebr of latent variables. Is there a way to "translate" this in a matlab scritpt?
After the cross-validation, is there anyway to make the script select automatically the number of components? Is there any function? My Y dataset is made of 3 columns and for each column we have 1 or 0 dependent on whether or not that sample belongs to that class.
% Modello PLS-DA prima della cross_val
modello_in=evrimodel('plsda');
% Calibrazione Modello PLS-DA prima della cross_val
modello_in.x=Xcal;
modello_in.y=Ycal;
modello_in.ncomp=5; % 5 LVs
modello_in.options.preprocessing={'autoscale' 'autoscale'};
modello_in.options.display='off';
modello_in=modello_in.crossvalidate({'vet' 10}, 15);
0 个评论
回答(1 个)
Simar
2024-6-12
Hi Pietro,
As per my understanding you want to extend script to include automatic selection of optimal number of latent variables based on cross-validation results and are looking for a function or method within the PLS_Toolbox or MATLAB environment that can facilitate this automatic selection process.
In PLS_Toolbox for MATLAB, the process of selecting number of latent variables (LVs) for a Partial Least Squares Discriminant Analysis (PLS-DA) model can indeed be automated, especially during cross-validation. Goal is to find optimal number of LVs that minimizes prediction error, which is crucial for building a robust and accurate model.
While PLS_Toolbox provides a user-friendly interface for these tasks, translating these actions into a MATLAB script offers more flexibility and automation. The “crossvalidate” method in PLS_Toolbox can be used not only to perform cross-validation but also to determine the optimal number of latent variables based on the cross-validation results.
Here is a conceptual approach to automatically selecting the number of components after cross-validation, adapted for a PLS-DA model in a MATLAB script. Note that specific function names and options might require adjustments based on the exact version of PLS_Toolbox in use:
% Define the PLS-DA model
model = evrimodel('plsda');
% Set the calibration data
model.x = Xcal; % Predictor variables
model.y = Ycal; % Response variables (classes encoded as 0 or 1)
% Set initial number of components
model.ncomp = 10; % Example: starting with 10 LVs
% Preprocessing options
model.options.preprocessing = {'autoscale', 'autoscale'};
model.options.display = 'off';
% Perform cross-validation
model = model.crossvalidate({'vet', 10, 'mc', 15});
% Extracting the optimal number of LVs from cross-validation results
[~, optimalLV] = min(model.cv.statistics.error); % Identifying the LVs with the minimum error
% Update the model with the optimal number of LVs
model.ncomp = optimalLV;
Note: The exact way to extract the optimal number of LVs might differ based on the structure of the 'model' object and the version of PLS_Toolbox.
Script outlines setting up a PLS-DA model, perform cross-validation, and then select the number of latent variables based on the cross-validation results. The key here is to analyze the “model.cv.statistics.error” array (or the equivalent in your version of PLS_Toolbox) to find the minimum error, which corresponds to the optimal number of components.
Please refer to the documentation for evrimodel-
Ensure checking the documentation for exact structure of model object after cross-validation, as the way to access the cross-validation statistics and errors might vary between different versions of the toolbox.
Hope it helps!
Best Regards,
Simar
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Gaussian Process Regression 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!