- Diverse and unbiased test data: Make sure the test data includes all the scenarios and variations the model is requiredto handle. Avoid introducing biases that may limit the model's generalization ability.
- Increase the dataset: Collect more information,if you can, to make your dataset bigger. More representative samples can be provided by a larger dataset, which can also aid the model in better identifyingunderlying trends.
- External validation: Try to collect data that is independent of the training data and closely representsreal-world scenarios. This can providea more reliable estimate of the model's performance and generalization ability.
SVM perfectly classified, is this correct?
3 次查看(过去 30 天)
显示 更早的评论
Hi All,
Please find attatched my SVM model for predicting actuator failure in an aircraft model. Six features are used in sim_out = [ax ay ax p q r] , this information is data obtained from an aircraft simulink model which all looks correct. The data is normalised and then used in the SVM model, where the outcome is binary 'nominal' or 'fault'.
I am achieving perfect seperation, I am not sure if this is due to an error or the model is just working well? I am new to this field and would appricate if someone could review the code and let me know if I am making an error as it seems too good to be true!
Thanks
%From 120 seconds on there is a fault.
faultLabel = vertcat(cellstr(repmat('nominal',2505, 1)), cellstr(repmat('fault',1350,1)));
classNum = 2;
load sim_out.mat
load simulation_time.mat
feature_vec = sim_out;
output_vec = faultLabel;
simulation_time = simulation_time;
%% Arrange training/test sets
% training set (around %70 percent of whole data set)
trainingDataExNum = ceil(70 / 100 * (length(feature_vec)));
% Select %70 of data for training and leave the rest for testing
randomSelectionColoumnNum = randperm(length(feature_vec),trainingDataExNum);
% Training set for feature and output
feature_vec_training = feature_vec(randomSelectionColoumnNum, :);
output_vec_training = output_vec(randomSelectionColoumnNum, :);
% Test set for feature and output
feature_vec_test = feature_vec;
feature_vec_test(randomSelectionColoumnNum, :) = [];
output_vec_test = output_vec;
output_vec_test(randomSelectionColoumnNum, :) = [];
test_set_time = simulation_time;
test_set_time(randomSelectionColoumnNum) = [];
%% TRAINING PHASE
%Gaussian model
SVMModel = fitcsvm(feature_vec_training,output_vec_training,'KernelFunction', 'gaussian', 'Standardize', true, 'ClassNames',{'nominal','fault'});
% Support vectors
sv = SVMModel.SupportVectors;
%%Cross Validation
CVSVMModel = crossval(SVMModel);
crossValClassificErr = kfoldLoss(CVSVMModel);
%%Prediction and Evaluation
[label,score] = predict(SVMModel,feature_vec_test);
CM = confusionmat(label,output_vec_test);
0 个评论
回答(1 个)
Harimurali
2023-9-8
Hi Julianne,
Perfect separation in an SVM may indicate overfitting as opposed to being a desirable result. When a model learns the training data, which may include noise and outliers, too well, it becomes overfit and is unable to generalize well to unseen data.
Since you have performed k-fold cross-validation and used held-out test data, perfect separation may also be desirable. But if the training data and test data are highly correlated, perfect separation could again point towards overfitting and be deemed undesirable.
It is vital to make sure that the test data is representative of the real-world scenarios the model would encounter,to handle these problems.
Consider the following steps:
Hope this helps.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!