In a multiclass classification problem using Random Forest/Tree Bagger. How would I determine the most important features for each particular class?
3 次查看(过去 30 天)
显示 更早的评论
Is there a quick and easy way to do this or will it require modification of the code?
0 个评论
回答(1 个)
Ilya
2014-7-10
You would need to specify more precisely what you mean by "features important for each class". Features are important (or not) for separating classes from each other.
For example, you can recast your question as "what features are important for separating this class from all other classes?" Then you can solve this binary problem. That is, you label observations of this class as "positive" and observations of all other classes as "negative". Then you run TreeBagger to separate the two formed classes and get estimates of feature importance.
3 个评论
Ilya
2014-7-10
I don't know what "Gini for each class" is. The Gini index is a measure of class separation defined for several (at least two) classes. You might have a clever idea how to modify that definition, but it's fair to say this is not mainstream practice.
In MATLAB, you have access to all trees through the Trees property of the TreeBagger object. Each tree exposes class probabilities in each node and the variable chosen for splitting this node. This should be enough for you to compute the gain in some criterion due to each decision split, provided you choose a criterion that can be expressed in terms of class probabilities before and after the split (that is, in the parent and two child nodes). You can then see how much each variable contributed to that gain.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Classification 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!