I obtain different test success than predicted from SVM training on similar datasets.

Question

Marco Tremblay 2019-9-22

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/481567-i-obtain-different-test-success-than-predicted-from-svm-training-on-similar-datasets

回答： Shishir Singhal 2020-7-28

I used the quadratic SVM or the Ensemble to train a classifier using Matlab's APP. To train, I used a 250 x 15 dataset and used the default setup for validation etc... (Case_Num_Train)

The Ensemble gave me a 76.3% accuracy on the test set it extracted from the training data.

I then produce a smaller (124) set of slightly different test cases using the same technique I previously used to produce the training set (Case_Num_Real). Using the "export compact model" tool, I obtained a Trained model that can run in a script (Test_ANN). This script feeds the test data into the trained model and compares the prediction with the real case.

This gave 38 errors out of 124 test cases. This is ~30% error. It is close but something is wrong as repeating the training gives a fairly consistent 76%.

No obvious difference is seen when looking at the data from the training and test sets.

The problem worsen when I used a larger training set of 2500 cases. There a quadratic SVM gives a training accuracy of 94.6% but the test with 250 cases produces 102 errors or 40%. Not good enough!

I considered overfitting and incrementall reduced the training set to the 250 presented above. While the trained accuracy and the test accuracy do converge with smaller set, it is mostly at the cost of degraded precision.

I cannot believe that this is the best we can achieve. What is wrong?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shishir Singhal 2020-7-28

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/481567-i-obtain-different-test-success-than-predicted-from-svm-training-on-similar-datasets#answer_471706

Hi,

There can be a multiple reasons behind low test accuracy when we are using SVM.

In your case,

Please check if your are splitting the data correctly.

Since, you are using SVM as a classifier, use startify split to split your data. Startify split helps you to maintain the class distribution among train, validation and test set.

Please refer to the documentation here: https://in.mathworks.com/help/stats/cvpartition.html to know more about the partitioning the data in MATLAB.

Moreover, overfitting can also because of various reasons:

Less training data.
Bad feature selection.
Redundant features.
Noise in data.

In this case, I would recommend to do some feature analysis of data before modelling.

Hope, above mentioned points will help you !!!

Thanks

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

I obtain different test success than predicted from SVM training on similar datasets.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

I obtain different test success than predicted from SVM training on similar datasets.

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论