Issue with cvpartition warning message

52 次查看(过去 30 天)
I have a data set that has a feature that is 10% true and 90% false. I tried partitioning it for cross/holdout validation and a warning error pops up saying that one or more class values in the group is not present in the test set even though I've used the cvpartition(group, "holdout", 0.2) format. This doesn't happen with the format where n is used instead of group.
partition1 = cvpartition(cartData.CARTSTATUS, "holdout", 0.2, 'Stratify', True);
For this code I get the warning message, even though after cross checking the tables with the test and training data, there is representation for both class values.
Why am I getting this warning message?
  1 个评论
Del Bohnenstiehl
Del Bohnenstiehl 2024-2-3
If you change the class label names in a catagoical variable. For example, replace class '1' with 'truck', ....etc, the catagorical variable still thinks '1' is a class label.
You can confirm this by running TABULATE on the group label variable, where you'd see zero samples for the class '1'.
Then you just need to remove the old label names.
This will stop the error. Or you could just ignore it, because (as you indicated) the test and training datasets are properly stratified.

请先登录,再进行评论。

回答(1 个)

Adam Danz
Adam Danz 2020-9-30
编辑:Adam Danz 2020-9-30
The warning message indicates that at least one of your group values is not represented within the test set.
Stratified sampling aims at creating equal representation of each class within the group between the training and test sets. If there is only 1 sample of a particular class, it cannot be in both training and test sets if you want to comply with stratified cross validation methods. That's why the warning recommends a non-stratisfied approach.
For a workaround, look into "cross validation with unbalanced data" to see what others have done to overcome this problem.
This warning is reproduced in the following example where 10% of the group (1 value) is True and 90% (9 values) are false.
% r2020b
TF = false(10,1);
TF(randi(numel(TF),ceil(numel(TF)*.1),1)) = true;
CVO = cvpartition(TF,'holdout',.2,'Stratify',true);
% Warning: One or more of the unique class values in GROUP is not present in the training set. For classification problems, either remove this class from the
% data or use N instead of GROUP to obtain nonstratified partitions. For regression problems with continuous response, use N.
% > In internal.stats.cvpartitionImpl>stra_holdoutcv (line 425)
% In internal.stats/cvpartitionImpl/rerandom (line 339)
% In internal.stats.cvpartitionInMemoryImpl (line 229)
% In cvpartition (line 175)
  2 个评论
Ashrit Tayade
Ashrit Tayade 2020-10-1
Hi! Thanks for your response. My issue here is that I'm getting this warning even though my test and training data have representation of both classes.
I have 3200 observations, out of which, close to 90% is false and the rest is true. Therefore, when I use the cvpartition function with the group as an argument instead of table height, both classes are present in the test and training data (which I've gone through to verify), but I still get a warning.
I don't get this warning, however, when I use the function like this:
partition1 = cvpartition(height(cartData, "holdout", 0.2);
As far as I know, using a number as an input to the function has more of a chance to result in a split where a class may not be represented in the test/training set and this problem can be eliminated by using the group as an input.
However, the warning message says differently. Please correct me if I'm wrong. Thanks.
Adam Danz
Adam Danz 2020-10-1
编辑:Adam Danz 2020-10-5
To make sure I understand, you're getting a warning that states,
One or more of the unique class values in GROUP is not present in the training (or test) set.
but you've checked the training/test sets and they all contain all classes. Is that correct?
If so, could you save the inputs to a mat file and attach it so we can reproduce the error?

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by