Classification problem parsed as regression problem when Split Criterion is supplied to fitcensemble

4 次查看(过去 30 天)
Hi
I ran a hyperparameter optimization to find the best parameters for a two-class classification problem using fitcensemble. But when I try to use these I get a strange warning:
Warning: You must pass 'SplitCriterion' as a character vector 'mse' for regression.
What is wrong with my code? The warning comes when I use a boosting ensemble as 'method'. When I remove the 'SplitCriterion' everything works fine, but I cannot understand why Matlab somewhere on the line thinks this is a regression problem when I use fit"c"ensemble. Here is a toy example with arbitrarily chosen settings that you can run to reproduce the Warning/Error.
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
t = templateTree( 'MaxNumSplits', 30,...
'MinLeafSize', 10,...
'SplitCriterion', 'gdi');
classificationEnsemble = fitcensemble(X,'Cylinders',...
'Method', 'LogitBoost', ...
'NumLearningCycles',12,...
'Learners',t,...
'KFold',7,...
'LearnRate',0.1);
  4 个评论
Don Mathis
Don Mathis 2017-4-6
When I run an optimization I never see successes for LogitBoost+gdi. Nor GentleBoost+gdi. They fail and eventually are not tried any more. Could you post a reproducible example of an optimization that shows successes for those combinations? That would be very helpful.
In any case, there is a problem in that LogitBoost is never run with 'mse', which is the only SplitCriterion it can use. As a workaround, you might try running a separate optimization without optimizing SplitCriterion. Then you could take the best result from the 2 optimizations. Something like this:
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
classificationEnsemble = fitcensemble(X,'Cylinders',...
'NumLearningCycles',12,...
'Learners','Tree',...
'OptimizeHyperparameters', {'Method', 'LearnRate', 'MinLeafSize', 'MaxNumSplits', 'NumVariablesToSample'})
Tobias Pahlberg
Tobias Pahlberg 2017-4-10
If I run the toy example like this, with 'OptimizeHyperparameters', 'all':
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
strct = struct( 'KFold', 7, 'Verbose',1, 'MaxObjectiveEvaluations',1000, 'SaveIntermediateResults', true, ...
'Repartition',false);
classificationEnsemble = fitcensemble(X, 'Cylinders',...
'Method', 'bag', ...
'Type', 'Tree', ...
'NumLearningCycles', 300,...
'OptimizeHyperparameters', 'all',...
'HyperparameterOptimizationOptions', strct);
I cannot see that they fail, but I don't know how that should look.

请先登录,再进行评论。

采纳的回答

Don Mathis
Don Mathis 2017-4-10
Thanks, I now see cases succeeding with 'GentleBoost' and 'gdi' together.
This is a bug in the search space for hyperparameter optimization in fitcensemble, which will cause it to evaluate some unnecessary points. During the optimization, whenever 'Method' is LogitBoost or GentleBoost, 'SplitCriterion' is always internally set to 'mse'. But the search space is defined in a way that doesn't acknowledge that, so it unnecessarily passes SplitCrierion values of 'gdi' and 'deviance' during the optimization.
So all points that you see with LogitBoost/gdi and LogitBoost/deviance are really just LogitBoost/mse.
Oddly, you are not allowed to explicitly specify 'SplitCriterion','mse' with fitcensemble. Instead, when LogitBoost or GentleBoost are used, you need to omit the SplitCriterion argument entirely.
The optimization results are still valid. You just need to adjust what arguments you pass to fitcensemble at the end. The simple fix in your case is to delete line 8 of your original code sample, which specifies the SplitCriterion. Here's the result:
load carsmall
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,Model_Year,Weight,MPG);
X.Cylinders(X.Cylinders < 8) = 0; % Create two classes in the Cylinders variable
t = templateTree( 'MaxNumSplits', 30,...
'MinLeafSize', 10);
classificationEnsemble = fitcensemble(X,'Cylinders',...
'Method', 'LogitBoost', ...
'NumLearningCycles',12,...
'Learners',t,...
'KFold',7,...
'LearnRate',0.1);
This bug will be fixed. Thanks very much for reporting this. I hope the workaround isn't too much of a nuisance for you.

更多回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by