Mahalanobis distance in matlab: pdist2() vs. mahal() function

Question

babi psylon 2013-11-12

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/105829-mahalanobis-distance-in-matlab-pdist2-vs-mahal-function

回答： babi psylon 2013-11-12

I have two matrices X and Y. Both represent a number of positions in 3D-space. X is a 50*3 matrix, Y is a 60*3 matrix. My question is why applying the mean-function over the output of pdist2() in combination with 'Mahalanobis' does not give the result obtained with mahal(). More details on what I'm trying to do below, as well as the code I used to test this.

Let's suppose the 60 observations in matrix Y are obtained after an experimental manipulation of some kind. What I'm trying to do is to assess whether this manipulation had a significant effect on the positions observed in Y. Therefore, I used pdist2(X,X,'Mahalanobis') to compare X to X to obtain a baseline, and later, X to Y (with X the reference matrix: pdist2(X,Y,'Mahalanobis')), and I plotted both distributions to have a look at the overlap. Subsequently, I calculated the mean Mahalanobis distance for both distributions and the 95% CI and did a t-test and Kolmogorov-Smirnoff test to asses if the difference between the distributions was significant. This seemed very intuitive to me, however, when testing with mahal(), I get different values, although the reference matrix is the same. I don't get what the difference between both ways of calculating mahalanobis distance is exactly.

% test pdist2 vs. mahal in matlab

% the purpose of this script is to see whether the average over the rows of E equals the values in d...

% data X = []; % 50*3 matrix, data omitted Y = []; % 60*3 matrix, data omitted

% calculations S = nancov(X);

% mahal() d = mahal(Y,X); % gives an 60*1 matrix with a value for each Cartesian element in Y (second matrix is always the reference matrix)

% pairwise mahalanobis distance with pdist2() E = pdist2(X,Y,'mahalanobis',S); % outputs an 50*60 matrix with each ij-th element the pairwise distance between element X(i,:) and Y(j,:) based on the covariance matrix of X: nancov(X) %{ so this is harder to interpret than mahal(), as elements of Y are not just compared to the "mahalanobis-centroid" based on X, % but to each individual element of X % so the purpose of this script is to see whether the average over the rows of E equals the values in d... %}

F = mean(E); % now I averaged over the rows, which means, over all values of X, the reference matrix

mean(d) mean(E(:)) % not equal to mean(d) d-F' % not zero

% plot output figure(1) plot(d,'bo'), hold on plot(mean(E),'ro') legend('mahal()','avaraged over all x values pdist2()') ylabel('Mahalanobis distance')

figure(2) plot(d,'bo'), hold on plot(E,'ro') plot(d,'bo','MarkerFaceColor','b') xlabel('values in matrix Y (Yi) ... or ... pairwise comparison Yi. (Yi vs. all Xi values)') ylabel('Mahalanobis distance') legend('mahal()','pdist2()')

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

babi psylon 2013-11-12

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/105829-mahalanobis-distance-in-matlab-pdist2-vs-mahal-function#answer_114986

http://stackoverflow.com/questions/19933883/mahalanobis-distance-in-matlab-pdist2-vs-mahal-function/19936086#19936086

An attempt to answer my own question, while adding a new question:

Well, I guess there are two different ways to calculate mahalanobis distance between two clusters of data like you explain above: 1) you compare each data point from your sample set to mu and sigma matrices calculated from your reference distribution (although labeling one cluster sample set and the other reference distribution may be arbitrary), thereby calculating the distance from each point to this so called mahalanobis-centroid of the reference distribution. 2) you compare each datapoint from matrix Y to each datapoint of matrix X, with, X the reference distribution (mu and sigma are calculated from X only)

The values of the distances will be different, but I guess the ordinal order of dissimilarity between clusters is preserved when using either method 1 or 2? I actually wonder when comparing 10 different clusters to a reference matrix X, or to each other, if the order of the dissimilarities would differ using method 1 or method 2? Also, I can't imagine a situation where one method would be wrong and the other method not. Although method 1 seems more intuitive in some situations, like mine.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Mahalanobis distance in matlab: pdist2() vs. mahal() function

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Mahalanobis distance in matlab: pdist2() vs. mahal() function

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论