How to get probabilities of each class which is classified with RUSBoost for an imbalanced data set

20 次查看(过去 30 天)
I have a dataset with 7 classes and 3 features. The data set is hugely imbalanced. So, I referred https://www.mathworks.com/help/stats/classification-with-imbalanced-data.html to classfy the data. I get a prediction accuracy of 94%. But I need the probability of getting each class for a feature or set of features. How to get probability of each class to a given feature?
[Nt Mt] = size(y); % Number of observations in the training sample
t = templateTree('MaxNumSplits',Nt);
rusTree = fitcensemble(X,y,'Method','RUSBoost', 'NumLearningCycles',1000,'Learners',t,'LearnRate',0.1,'nprint',100);
[~,scores] = predict(rusTree,[1 16 3 5])
I get following scores for above code, 0.7345, 3.5105, 1.1893, 0, 0, 0, 0.0082
But above scores are not probablities, how to get values between 0-1 where sum of proabilities in all classes is equal to 1?

采纳的回答

Raunak Gupta
Raunak Gupta 2020-4-29
编辑:Raunak Gupta 2020-4-29
Hi,
The reason behind predict not returning scores as probability estimates is because the RUSBoost algorithm used in the model does not treat scores as probabilistic estimates. Instead, the score represents the confidence of a classification into a class, higher, being more confidence as it is explained in the documentation link of fitcensemble .
If you would like to get probabilistic estimate for scores you can set the 'ScoreTransform' to 'logit' in 'fitcensemble'. This name-value pair transforms the score to probabilistic estimates. This is explained here. Then using predict on the model returns scores as probability values for each class.
  2 个评论
Siddharth Arora
Siddharth Arora 2022-2-27
Hi Raunak,
I have treid the suggested approaches: (1) using Score Transform to logit in fitcenesmble (for a binary classification problem and the scores are still not probabilistic estimates. I have tried specifing 'ScoreTransform' to 'logit' in 'fitcensemble', and also tried Mdl.ScoreTransform = 'logit' before using the 'predict' function, and the scores (any given row) do not add to 1. I have tried 'doublelogit' for Adaboost and that works fine. But not RUSboost. Please let me know how else I could convert scores from RUSboost to probabilistic estimates? Is it right to use scores from RUSboost as inputs for perfcurve to get AUC values, or should the scores be transformed first? Thank you
Louis
Louis 2023-11-6
I am experiencing the exactly same issue as Siddharth Arora as above. Setting "ScoreTransform' to 'logit' ensures that the score outputs are below 1, but score outputs do not sum to 1.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Classification Ensembles 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by