I obtain different test success than predicted from SVM training on similar datasets.

1 次查看(过去 30 天)
I used the quadratic SVM or the Ensemble to train a classifier using Matlab's APP. To train, I used a 250 x 15 dataset and used the default setup for validation etc... (Case_Num_Train)
The Ensemble gave me a 76.3% accuracy on the test set it extracted from the training data.
I then produce a smaller (124) set of slightly different test cases using the same technique I previously used to produce the training set (Case_Num_Real). Using the "export compact model" tool, I obtained a Trained model that can run in a script (Test_ANN). This script feeds the test data into the trained model and compares the prediction with the real case.
This gave 38 errors out of 124 test cases. This is ~30% error. It is close but something is wrong as repeating the training gives a fairly consistent 76%.
No obvious difference is seen when looking at the data from the training and test sets.
The problem worsen when I used a larger training set of 2500 cases. There a quadratic SVM gives a training accuracy of 94.6% but the test with 250 cases produces 102 errors or 40%. Not good enough!
I considered overfitting and incrementall reduced the training set to the 250 presented above. While the trained accuracy and the test accuracy do converge with smaller set, it is mostly at the cost of degraded precision.
I cannot believe that this is the best we can achieve. What is wrong?

回答(1 个)

Shishir Singhal
Shishir Singhal 2020-7-28
Hi,
There can be a multiple reasons behind low test accuracy when we are using SVM.
In your case,
Please check if your are splitting the data correctly.
Since, you are using SVM as a classifier, use startify split to split your data. Startify split helps you to maintain the class distribution among train, validation and test set.
Please refer to the documentation here: https://in.mathworks.com/help/stats/cvpartition.html to know more about the partitioning the data in MATLAB.
Moreover, overfitting can also because of various reasons:
  • Less training data.
  • Bad feature selection.
  • Redundant features.
  • Noise in data.
In this case, I would recommend to do some feature analysis of data before modelling.
Hope, above mentioned points will help you !!!
Thanks

类别

Help CenterFile Exchange 中查找有关 Classification 的更多信息

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by