- Even if your dataset contains only whole numbers, the tree considers midpoints between consecutive values as potential split points.
- For example, if your sorted values are {3, 5, 7, 9, 11, 13, ...}, the tree might evaluate splits at {4, 6, 8, 10, 12, ...}.
- The split at 9.5 means the algorithm found that separating values below 9.5 from those above 9.5 resulted in the best reduction in impurity or error.
How is root node value chosen in regression decision tree?
2 次查看(过去 30 天)
显示 更早的评论
I understand the criteria for node splitting and how the root node variable is chosen but I do not understand how the actual value for the inequality at the root node is chosen. Is it just local optimization of the numbers? For example, I have a variety of whole number values ranging from 3 to 25 and the root node is chosing 9.5. This is not the median or mean, so why is this number chosen? Is it because the decision tree analyzed all potential values to see what had the lowest MSE to start with? If so, why did it chose a decimal number when all my data points are whole numbers?
Thank you for your help!
0 个评论
回答(1 个)
Ayush Aniket
2025-6-4
The split value at the root node in a decision tree is chosen based on optimization criteria, not necessarily the median or mean. Decision trees aim to minimize impurity (for classification) or reduce variance/MSE (for regression).The algorithm evaluates all possible split points and selects the one that maximizes information gain or minimizes error.
Why a Decimal Value Instead of Whole Numbers?
In MATLAB, you can visualize the tree using:
view(SVModelTree, 'Mode', 'graph');
Refer the following documentation to learn more about the viewing options: https://www.mathworks.com/help/stats/view-decision-tree.html
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!