In a multiclass classification problem using Random Forest/Tree Bagger. How would I determine the most important features for each particular class?

3 次查看(过去 30 天)
Is there a quick and easy way to do this or will it require modification of the code?

回答(1 个)

Ilya
Ilya 2014-7-10
You would need to specify more precisely what you mean by "features important for each class". Features are important (or not) for separating classes from each other.
For example, you can recast your question as "what features are important for separating this class from all other classes?" Then you can solve this binary problem. That is, you label observations of this class as "positive" and observations of all other classes as "negative". Then you run TreeBagger to separate the two formed classes and get estimates of feature importance.
  3 个评论
Ilya
Ilya 2014-7-10
I don't know what "Gini for each class" is. The Gini index is a measure of class separation defined for several (at least two) classes. You might have a clever idea how to modify that definition, but it's fair to say this is not mainstream practice.
In MATLAB, you have access to all trees through the Trees property of the TreeBagger object. Each tree exposes class probabilities in each node and the variable chosen for splitting this node. This should be enough for you to compute the gain in some criterion due to each decision split, provided you choose a criterion that can be expressed in terms of class probabilities before and after the split (that is, in the parent and two child nodes). You can then see how much each variable contributed to that gain.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Classification 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by