Feature forward selection with SVM not giving the correct criterion

Question

Esmeralda Ruiz Pujadas 2021-12-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1601580-feature-forward-selection-with-svm-not-giving-the-correct-criterion

回答： Shubham 2024-2-26

Dear all,

I want to use Feature Forward selection with SVM but I get the criterion very small and very different if I do the svm with the selected features with the loss function. But the features selected are ok for my model but I do not understand the output.

classifierfun = @(train_data,train_labels,test_data,test_labels) ...

loss(fitcsvm(train_data,train_labels,'KernelFunction',

'gaussian','Standardize',true),test_data,test_labels,'LossFun', 'ClassifError');

[fs,history] = sequentialfs(classifierfun,table2array(TableFeaturesNormalized),

Y,'cv',c,'nfeatures',min(size(TableFeaturesNormalized,2),max_its_fs),'options',opts)

Step 1, added column 5, criterion value 0.00873988

Step 2, added column 9, criterion value 0.00812571

Step 3, added column 1, criterion value 0.00839142

Step 4, added column 2, criterion value 0.00785281

Step 5, added column 3, criterion value 0.00792138

Step 6, added column 4, criterion value 0.00827403

Step 7, added column 7, criterion value 0.00872569

Step 8, added column 6, criterion value 0.00859294

Step 9, added column 8, criterion value 0.00879047

If I replace it with

classifierfun = @(train_data,train_labels,test_data,test_labels) ...

sum(predict(fitcsvm(train_data,train_labels,'KernelFunction',

'gaussian','Standardize',true), test_data) ~= test_labels);

The criterion makes sense (around 0.30) but the features selected are not so good as using the loss function. Any help?

Thanks

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shubham 2024-2-26

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1601580-feature-forward-selection-with-svm-not-giving-the-correct-criterion#answer_1416318

Hi Esmeralda,

It seems like you're experiencing a discrepancy between the criterion values during feature selection and the actual loss when you use the selected features to train and evaluate an SVM model. This can happen for several reasons, and I'll try to address some potential causes and solutions.

Firstly, the sequentialfs function in MATLAB performs feature selection by trying to minimize a criterion, which in your case is the classification error rate given by the loss function of the SVM. The criterion values you see during the feature selection process are based on cross-validation on your training set.

Here are some points to consider:

The criterion values during feature selection are likely based on cross-validation and may not directly translate to the absolute performance of the model. They are relative values that help determine which features to add or remove during the selection process.
In the first classifierfun, you're using the loss function with 'ClassifError', which computes the classification error rate (a proportion). In the second classifierfun, you're directly counting the number of misclassifications. These two approaches are different; the loss function normalizes the error by the number of test samples, while the sum of predict does not. This could explain why the criterion values are so different.
The cross-validation process used within sequentialfs can lead to variability in the criterion values, especially if your dataset is small or if there's a high variance in your model's performance across different folds.
Feature selection algorithms can be unstable, meaning that small changes in the data can lead to different features being selected. This is especially true for algorithms like forward selection.
It's possible that the feature selection process is overfitting to the training data, especially if the number of features is high relative to the number of samples. This could lead to low error rates during feature selection that don't generalize well to new data.

To address these issues, you could:

Make sure you have a separate test set that was not used during feature selection to evaluate the final model's performance.
Use regularization techniques in your SVM to prevent overfitting.
Use methods to increase the stability of feature selection, such as running the selection multiple times on different subsamples of the data and choosing features that are consistently selected.
Ensure that the performance metric you use during feature selection aligns well with your model's intended use case. Sometimes, accuracy is not the best measure, and you might want to consider other metrics like F1-score, precision, recall, or AUC-ROC, depending on your problem.
Experiment with different criteria for feature selection. For instance, you might want to use a different loss function or a different measure of model performance.

Remember that feature selection is just one part of the model-building process, and it's important to consider the overall workflow, including data preprocessing, model selection, and evaluation.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Feature forward selection with SVM not giving the correct criterion

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Feature forward selection with SVM not giving the correct criterion

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论