Proper use of ClassificationTree.fit for categorical variables?
2 次查看(过去 30 天)
显示 更早的评论
The documentation for fitting classification trees states that X needs to be a floating point array, but also indicates that X can represent categorical variables (using the 'CategoricalPredictors' Name-Value argument).
Is the proper way to handle this to
(1) take the categorical variable, e.g.
category1 = {'duck','duck','goose','squash','quartz'}';
category2 = {'animal','animal','animal','vegetable','mineral'}';
(2) run those through grp2idx()
numcat1 = grp2idx(category1);
numcat2 = grp2idx(category2);
(3) Embed those in my X:
X = [numcat1 numcat2 otherTrulyNumericalVariables]
(4) Identify those as categorical
tree = ClassificationTree.fit(X,Y,'CategoricalPredictors',[1 2])
Seems like that's probably right, but I'd love an expert to vet that idea. The documentation doesn't have a categorical example.
0 个评论
采纳的回答
Ilya
2013-11-8
Yes, this would be one way to accomplish this. You'd have to be careful when you convert new data to numeric for prediction. If the new data are missing a level (for example, 'goose' does not appear in the value set), grp2idx can return different indices for the same categorical values. One way to avoid this pitfall would be by using the nominal type and specifying the level order explicitly, for example:
category1 = nominal({'duck','duck','goose','squash','quartz'},...
[],{'goose','squash','quartz' 'duck'})
numcat1 = double(category1)
Depending on how you get your data, you might find it easier to put your entire data (numeric and categorical variables) into a table or, if you are not in R2013b yet, into a dataset object and then extract numeric and categorical variables from that object.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Descriptive Statistics and Visualization 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!