Issue with cvpartition warning message
52 次查看(过去 30 天)
显示 更早的评论
I have a data set that has a feature that is 10% true and 90% false. I tried partitioning it for cross/holdout validation and a warning error pops up saying that one or more class values in the group is not present in the test set even though I've used the cvpartition(group, "holdout", 0.2) format. This doesn't happen with the format where n is used instead of group.
partition1 = cvpartition(cartData.CARTSTATUS, "holdout", 0.2, 'Stratify', True);
For this code I get the warning message, even though after cross checking the tables with the test and training data, there is representation for both class values.
Why am I getting this warning message?
1 个评论
Del Bohnenstiehl
2024-2-3
If you change the class label names in a catagoical variable. For example, replace class '1' with 'truck', ....etc, the catagorical variable still thinks '1' is a class label.
You can confirm this by running TABULATE on the group label variable, where you'd see zero samples for the class '1'.
Then you just need to remove the old label names.
This will stop the error. Or you could just ignore it, because (as you indicated) the test and training datasets are properly stratified.
回答(1 个)
Adam Danz
2020-9-30
编辑:Adam Danz
2020-9-30
The warning message indicates that at least one of your group values is not represented within the test set.
Stratified sampling aims at creating equal representation of each class within the group between the training and test sets. If there is only 1 sample of a particular class, it cannot be in both training and test sets if you want to comply with stratified cross validation methods. That's why the warning recommends a non-stratisfied approach.
For a workaround, look into "cross validation with unbalanced data" to see what others have done to overcome this problem.
This warning is reproduced in the following example where 10% of the group (1 value) is True and 90% (9 values) are false.
% r2020b
TF = false(10,1);
TF(randi(numel(TF),ceil(numel(TF)*.1),1)) = true;
CVO = cvpartition(TF,'holdout',.2,'Stratify',true);
% Warning: One or more of the unique class values in GROUP is not present in the training set. For classification problems, either remove this class from the
% data or use N instead of GROUP to obtain nonstratified partitions. For regression problems with continuous response, use N.
% > In internal.stats.cvpartitionImpl>stra_holdoutcv (line 425)
% In internal.stats/cvpartitionImpl/rerandom (line 339)
% In internal.stats.cvpartitionInMemoryImpl (line 229)
% In cvpartition (line 175)
2 个评论
Adam Danz
2020-10-1
编辑:Adam Danz
2020-10-5
To make sure I understand, you're getting a warning that states,
One or more of the unique class values in GROUP is not present in the training (or test) set.
but you've checked the training/test sets and they all contain all classes. Is that correct?
If so, could you save the inputs to a mat file and attach it so we can reproduce the error?
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!