Understanding MatLab's built-in SVM cross-validation on fitcsvm

Question

Carlos Mendoza 2020-8-30

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/586469-understanding-matlab-s-built-in-svm-cross-validation-on-fitcsvm

评论： Xingwang Yong 2020-10-3

svm_crossval_data.mat

I have a dataset of 53 trials and I want to do leave-one-out cross-validation of a binary classifier. I tried to explicitly do the cross-validation of an SVM, with this code:

SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
    'BoxConstraint', 0.046125, 'ClassNames', class_names};
SVMModel = cell(53,1);
for i_trial = 1:53
%% Train
train_set_indices = [1:i_trial-1 i_trial+1:n_trials];
SVMModel{i_trial} = fitcsvm(input_data(train_set_indices, :), ...
    true_labels(train_set_indices), SVM_params{:});
%% Predict
[estimated_labels(i_trial), score] = predict(SVMModel{i_trial}, ...
    input_data(i_trial, :));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;

which gives me class_error equals to 0.4151.

However, if I tried MatLab's built-in SVM cross-validation

SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
    'Leaveout', 'on', 'BoxConstraint',  0.046125, 'ClassNames', class_names};
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});

CSVM.kfoldLoss would be equal to 0.3208. Why the difference? What I am doing wrong in my explicit cross-validation?

I did the same exercise with 'Standarize', off and 'KernelScale', 987.8107 (optimized hyperparameters), and the difference is more dramatic: class_error=0.4528, while CSVM.kfoldLoss=0.

Finally, I would also like to know how what was the training and validation set for each of the trained models in CSVM.Trained. I would like to call predict on each trained model with the left-out sample (trial) and compare the result with CSVM.kfoldPredict.

Update 1: I found that c.traininig and c.test return the indices of the training and test sets. However, this code

SVM_params = {'KernelFunction', 'linear', 'Standardize', true, 'CVPartition', c,...
    'BoxConstraint', BoxConstraint, 'ClassNames', class_names};
estimated_labels = cell(1,53);
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
for ii=1:53
    estimated_labels(ii) = predict(CSVM.Trained{ii}, input_data(c.test(ii),:,1));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;

gives me class_error=0.5849, which is different to CSVM.kfoldLoss (0.3208). Why the difference? Is this the right way to double-check the cross-validation?

Update 2: I attached the data.

Thanks!

2 个评论
显示无隐藏无

Image Analyst 2020-8-31

No answers probably because you forgot to attach your data.

Carlos Mendoza 2020-8-31

I didn't forget. I thought that the code would be enough. Probably an error.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Xingwang Yong 2020-9-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/586469-understanding-matlab-s-built-in-svm-cross-validation-on-fitcsvm#answer_502303

Maybe kfoldLoss uses a different definition of loss than yours. Your definition is 1-accuracy.

https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedkernel.kfoldloss.html?s_tid=srchtitle

2 个评论
显示无隐藏无

Carlos Mendoza 2020-10-1

The default is 'classiferror', which is what I am using:

https://www.mathworks.com/help/stats/classificationpartitionedmodel.kfoldloss.html#bswic2v-2

What do you mean by "1-accuracy"?

Xingwang Yong 2020-10-3

在 MATLAB Online 中打开

class_error = error_count / n_trials;
            = (n_trials - correct_count) / n_trials
            = 1 - correct_count / n_trials
            = 1 - accuracy

That is your definition of loss.

请先登录，再进行评论。