Classifier not working properly on test set

Question

Warid Islam 2020-7-14

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/564740-classifier-not-working-properly-on-test-set

回答： Nipun Katyal 2020-7-21

new7.xlsx

Hello,

I have trained a SVM classifier on a breast cancer feature set. I get a validation accuracy of 83% on the training set but the accuracy is very poor on the test set. The data set has 1999 observations and 9 features.The training set to test set ratio is 0.6:0.4. Any suggestions would be very much appreciated. Thank you.

X_train=table2array(new7(1:1200,1:9));
y_train=table2array(new7(1:1200,10));
X_test=table2array(new7(1201:1999,1:9));
y_test=table2array(new7(1201:1999,10));
    
Mdl = fitcsvm(...
    X_train, ...
    y_train, ...
    'KernelFunction','rbf',...
    'OptimizeHyperparameters','auto',...
    'HyperparameterOptimizationOptions',...
    struct('AcquisitionFunctionName',...
    'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100;
%% save model
saveCompactModel(Mdl,'mySVM');

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Nipun Katyal 2020-7-21

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/564740-classifier-not-working-properly-on-test-set#answer_468813

在 MATLAB Online 中打开

  clc
clear all
rawData = xlsread('new7.xlsx');
[m,n] = size(rawData);
new7 = rawData(randperm(m),:);
X_train=new7(1:1200,1:9);
y_train=new7(1:1200,10);
X_test=new7(1201:1999,1:9);
y_test=new7(1201:1999,10);
    
Mdl = fitcsvm(...
    X_train, ...
    y_train, ...
    'KernelFunction','rbf',...
    'OptimizeHyperparameters','auto',...
    'HyperparameterOptimizationOptions',...
    struct('AcquisitionFunctionName',...
    'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100
%% save model
saveCompactModel(Mdl,'mySVM');

On observing your data set you will find that the labels for class 1 and 2 are clubbed together which leaves a discrepancy in the validation set and test set in which the validation set contained a majority of class 1 and some class 2 features while you test set had the opposite which resulted in poor accuracy, so before splitting the data into validation and test you should jumble the rows, so as to provide an equal distribution of classes across the two sets.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Classifier not working properly on test set

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Classifier not working properly on test set

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论