Hi Keyan Li,
In MATLAB, there is a built-in function, “proximity” to calculate proximity matrix. But this function works only for “CompactTreeBagger”. You can refer to the following link: https://www.mathworks.com/help/stats/compacttreebagger.proximity.html
However, there isn’t direct built-in support for obtaining the proximity matrix from a random forest model built with “fitcensemble” using the “Bag” method.
Yet, the proximity matrix for a random forest can be calculated manually, here’s a workaround:
- For each tree in the ensemble, predict the leaf indices for each data point.
- For each tree, if two data points end up in the same leaf, increment their corresponding entry in the proximity matrix.
- Optionally, scale the proximity matrix by the total number of trees to get the average proximity.