why svm is classifying larger dataset with good accuracy than smaller data set

3 次查看(过去 30 天)
i have applied svm and knn on sound dataset to detect diseases . i applied both on two datasets individually and there is a good gap between accuracy of svm and knn , knn performing better. but when i combined both datasets and used knn and svm again their accuracies are almost equal . why is it so. can anyone please let me know the reasons of this . any refrence to the answer is appreciated . thankyou

回答(2 个)

Samay Sagar
Samay Sagar 2023-7-17
When combining two datasets, the change in accuracy can be attributed to factors such as a more balanced data distribution, complementary information in the datasets, reduced risk of overfitting or underfitting, and consistent feature representation. To gain a deeper understanding, analyze the characteristics of the datasets, feature representation, and decision boundaries. Consider conducting statistical tests or cross-validation experiments.
Larger datasets provide more samples for training, allowing the model to capture a more comprehensive representation of the underlying patterns and relationships in the data. This can lead to improved model performance, higher accuracy, and better generalization to unseen data. When using a smaller dataset, the model will overfit the dataset and won't be able to generalize well to unseen data.

Milan Bansal
Milan Bansal 2023-8-31
Hi,
I understand that you want know why there is a difference in accuracy when KNN (K-Nearest Neighbours) and SVM(Support Vector Machine) classifiers applied on individual datasets and after combining them.
Here are a few possible reasons for the observed behaviour:
  1. Data Distribution: Individual datasets may have imbalanced distributions, leading to biased predictions. Combining datasets creates a more balanced distribution, allowing both classifiers to perform better and achieve similar accuracies.
  2. Complementary Information: Each dataset may contain unique information relevant to the classification task. Combining datasets provides more diverse information to both classifiers, improving their performance and resulting in similar accuracies.
  3. Feature Compatibility: It is important to ensure that the features used by both classifiers are compatible when combining datasets. If the combined dataset contains features more suitable for one classifier, it can result in similar accuracies for both classifiers.
Note that the specific behaviour observed may vary depending on the characteristics of the datasets, the nature of the disease detection problem, and the specific implementations of SVM and KNN.
To gain a better understanding of the observed behaviour, it is recommended to analyse the combined dataset, evaluate the feature distributions, and perform further experiments or analysis to investigate the impact of combining datasets on the classification results.

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

标签

产品


版本

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by