Why is the accuracy reported in the Classification Learner app much higher than the accuracy of the exported model on the held out validation set?
3 次查看(过去 30 天)
显示 更早的评论
Problem description
I used the default 5-fold cross-validation (CV) scheme in the Classification Learner app and trained all the available models. The best model (quadratic SVM) has 74.2% accuracy. I used
export model => generate code
and then ran the generated code, again examining the 5-fold CV accuracy. Surprisingly, the validation accuracy of this generated model was 66.8%, which is much lower than that reported by the app.
This has happened repeatedly - everytime I follow this scheme (with different datasets and models) I observe reduced accuracy for the exported model.
Potential explanations
I understand that not resetting the random number generation seed may lead to some variability between runs, but this effect seems too large and consistent to be explained by chance. I also understand that choosing the empirically-best model on the validation set, without having an additional test set, may lead to overestimation of its performance. If the exported model has a different CV partition, that could explain some of the reduction in accuracy, but I wonder if there might be some other explanation that I am missing.
0 个评论
回答(1 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Classification Learner App 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!