Why is loss() different from calculating misclassification error using predict()?

2 次查看(过去 30 天)
I am trying to fit an ECOC model to my data but the misclassification calculated from loss() is different to the misclassification calculated by comparing the predicted labels from predict() with the true labels. The same thing happens when using a different model i.e. KNN.
Even though the test dataset has 10 observations, where the misclassification error should be a multiple of 0.1 to my knowledge, loss() outputs 0.8293.
Could someone please help me understand why these are different, i.e. what is going on with the loss() function? And which is more appropriate for evaluating/reporting test set accuracy.
rng(1234)
% define variables
xtrain = rand(100,4); % random numbers, n = 100
xtest = rand(10,4); % random numbers, n = 10
ytrain = ceil(4*rand(100,1)); % 4 classes, n = 100
ytest = ceil(4*rand(10,1)); % 4 classes, n = 10
% train model
mdl1 = fitcecoc(xtrain,ytrain,'Coding','onevsall','Learners','svm');
mdl2 = fitcknn(xtrain,ytrain);
% calculate loss from loss()
loss1mdl1 = loss(mdl1,xtest,ytest);
loss1mdl2 = loss(mdl2,xtest,ytest);
% calculate loss from predict()
loss2mdl1 = 1-mean(predict(mdl1,xtest)==ytest);
loss2mdl2 = 1-mean(predict(mdl2,xtest)==ytest);

回答(2 个)

Sulaymon Eshkabilov
There is a small difference between loss() and predict() fcns. The difference of loss is coming from the calculation of loss fcn value thta considers weight for observation. Otherwise, everything is working as expected:
rng(1234)
% define variables
xtrain = rand(100,4); % random numbers, n = 100
xtest = rand(10,4); % random numbers, n = 10
ytrain = ceil(4*rand(100,1)); % 4 classes, n = 100
ytest = ceil(4*rand(10,1)); % 4 classes, n = 10
% train model
mdl1 = fitcecoc(xtrain,ytrain,'Coding','onevsall','Learners','svm');
mdl2 = fitcknn(xtrain,ytrain);
% calculate loss from loss()
loss1mdl1 = loss(mdl1,xtest,ytest)
loss1mdl1 = 0.8293
loss1mdl2 = loss(mdl2,xtest,ytest)
loss1mdl2 = 0.7440
Y1 = predict(mdl1,xtest);
Y2 = predict(mdl2,xtest);
YC1 = [ytest,Y1] % Two correct answers out of 10, i.e., accuracy is 20%
YC1 = 10×2
1 3 1 4 2 1 2 4 1 1 1 4 2 4 1 1 4 1 4 1
YC2 = [ytest,Y2] % Three correct answers out of 10, i.e., accuracy 30%
YC2 = 10×2
1 2 1 1 2 4 2 3 1 1 1 1 2 3 1 2 4 2 4 1
% calculate loss from predict()
loss2mdl1 = 1-mean(predict(mdl1,xtest)==ytest)
loss2mdl1 = 0.8000
loss2mdl2 = 1-mean(predict(mdl2,xtest)==ytest)
loss2mdl2 = 0.7000

Drew
Drew 2025-5-7
This is because the classreg loss function is normalizing the observation weights so that they sum to the prior probability in the respective class. This can be avoided by providing a custom loss function, as seen in this answer: https://www.mathworks.com/matlabcentral/answers/492062-loss-the-classification-error

类别

Help CenterFile Exchange 中查找有关 Classification Ensembles 的更多信息

产品


版本

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by