How to include all variables in each decision tree of an ensemble?
2 次查看(过去 30 天)
显示 更早的评论
Hi everyone. I am fitting the following 10-tree ensemble.
X = rand(1000,50);
Y = rand(1000,1);
N = size(X,2);
Ntrees=10;
t = templateTree('NumVariablesToSample','all');
Mdl = fitrensemble(X,Y,'Method','LSBoost','Learners',t,'NumLearningCycles',Ntrees);
Below I extract the number of variables that are included in each of the 10 trees.
z = false(N,Ntrees);
for i = 1:Ntrees
idx = unique(Mdl.Trained{i}.CutPredictorIndex);
idx(idx==0)=[];
z(idx,i) = 1;
end
sum(z)
>> ans =
8 10 8 10 9 9 10 8 9 9
Despite setting 'NumVariablesToSample’ to ‘all’, when I extract the variables included in each tree, only 8-10 out of the 50 features are included in each tree. Does anyone have a suggestion on how to force all variables to be included in all trees? Thanks.
0 个评论
回答(1 个)
Aditya Patil
2021-2-16
'NumVariablesToSample' defines the number of variables(predictors) which will be considered at any given split. The decision tree algorithm picks random set of predictors, and then selects one of them, based on certain criterias.
It might not be necessary, or sometimes even possible, to use a specific variable in a tree. For example, consider if a prior split leaves samples of only one class. In such a case, selecting a decision boundary for that variable will not be possible.
If you need to use all variables, you can look at some of the other classification algorithms available in MATLAB, such as SVM.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Classification Ensembles 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!