How to improve machine learning classification testing result?

Question

Sandy Winardi 2022-5-19

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1722545-how-to-improve-machine-learning-classification-testing-result

回答： Gagan Agarwal 2023-10-4

I want to do machine learning classification of data with inputs (1000x720) and targets (2x720),

I've tried using the neural networks functions with the following parameters:

net.divideParam.trainRatio = 0.65; % ratio of data used for the training
net.divideParam.valRatio = 0.0;
net.divideParam.testRatio = 0.35; % ratio of data used for the testing

But, the testing result was only around 54%.

I've tried changing the training and testing ratio, increasing the training epochs and changing the hiddenLayerSize value:

hiddenLayerSize = 20;
net.trainParam.epochs = 5000;

But no significant improvement can be seen.

What can I do to improve the testing result? (to at least >70%)

Thank you in advance.

2 个评论
显示无隐藏无

the cyclist 2022-5-19

It's not really possible to help without seeing the data or your code.

Also, how do you know that the inputs predict the target? Predictibility isn't guaranteed. Just hoping for more accuracy doesn't mean it can be achieved.

Sandy Winardi 2022-5-19

inputs_and_outputs.mat

Sorry for the confusion, I'm fairly new to this topic.

The "targets" is the known outputs of the inputs.

The inputs data consists of 720 observations and 1000 features.

The classification has 2 classes (0 or 1).

I've attached the inputs and outputs if it's still not clear.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Gagan Agarwal 2023-10-4

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1722545-how-to-improve-machine-learning-classification-testing-result#answer_1325459

Hi Sandy Winardi,

I understand that you are trying to classify data with 720 observations and 1000 features. To improve the testing result accuracy of your machine learning model, you can try the following approaches:

Feature Selection/Dimensionality Reduction: It is very unlikely that all 1000 features are relevant for classification. You can use techniques like Principal Component Analysis (PCA) to identify the most informative features.
Data Preprocessing: Normalize the input features to have zero mean and unit variance. This will help the model converge faster and improve performance.
Hyperparameter Tuning: Experiment with different hyper parameters like learning rate, number of layers, and number of nodes in each layer.
Cross-Validation: Instead of relying solely on a fixed train-test split, consider using cross-validation techniques such as k-fold cross-validation etc.