How does Treebagger handle missing values?

7 次查看(过去 30 天)
I've seen bits and pieces of this answer, such that NaNs get ignored in Treebagger, but no explicit answer. How are the NaNs being ignored? Does the entire row or column containing a NaN get removed? Or if an observation in the training data for an individual tree is missing that variable, is the variable simply not used on that individual tree but still used in other trees in the random forest? Or do the missing values get imputed? If so, with what?
If anyone could give me a definitive answer on what the Treebagger function is doing with them that would be amazing.

回答(1 个)

Matlab
Matlab 2017-11-25
Random forest consists of the decision tree. I think the answer of the question is how fittree resolve the missing value.Actually the question can divide into two parts——training part and prediction part. In default, when it comes to split a node, it will ignore the sample whose testing value is missing in the impurity computation. It also can use another split method surrogate decision splits to deal with the missing value. The details are explained in the help document. When it comes to Prediction, the sample is missing in the testing attribute.I'm not sure about this part. It will produce some copies, and each copy will come along the branch with corresponding probability. The main idea is from the paper 《Induction of the decision tree》

类别

Help CenterFile Exchange 中查找有关 Classification Ensembles 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by