How does Treebagger handle missing values?
7 次查看(过去 30 天)
显示 更早的评论
I've seen bits and pieces of this answer, such that NaNs get ignored in Treebagger, but no explicit answer. How are the NaNs being ignored? Does the entire row or column containing a NaN get removed? Or if an observation in the training data for an individual tree is missing that variable, is the variable simply not used on that individual tree but still used in other trees in the random forest? Or do the missing values get imputed? If so, with what?
If anyone could give me a definitive answer on what the Treebagger function is doing with them that would be amazing.
0 个评论
回答(1 个)
Matlab
2017-11-25
Random forest consists of the decision tree. I think the answer of the question is how fittree resolve the missing value.Actually the question can divide into two parts——training part and prediction part. In default, when it comes to split a node, it will ignore the sample whose testing value is missing in the impurity computation. It also can use another split method surrogate decision splits to deal with the missing value. The details are explained in the help document. When it comes to Prediction, the sample is missing in the testing attribute.I'm not sure about this part. It will produce some copies, and each copy will come along the branch with corresponding probability. The main idea is from the paper 《Induction of the decision tree》
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Classification Ensembles 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!