I have two matrices X and Y. Both represent a number of positions in 3D-space. X is a 50*3 matrix, Y is a 60*3 matrix. My question is why applying the mean-function over the output of pdist2() in combination with 'Mahalanobis' does not give the result obtained with mahal(). More details on what I'm trying to do below, as well as the code I used to test this.
Let's suppose the 60 observations in matrix Y are obtained after an experimental manipulation of some kind. What I'm trying to do is to assess whether this manipulation had a significant effect on the positions observed in Y. Therefore, I used pdist2(X,X,'Mahalanobis') to compare X to X to obtain a baseline, and later, X to Y (with X the reference matrix: pdist2(X,Y,'Mahalanobis')), and I plotted both distributions to have a look at the overlap. Subsequently, I calculated the mean Mahalanobis distance for both distributions and the 95% CI and did a t-test and Kolmogorov-Smirnoff test to asses if the difference between the distributions was significant. This seemed very intuitive to me, however, when testing with mahal(), I get different values, although the reference matrix is the same. I don't get what the difference between both ways of calculating mahalanobis distance is exactly.
% test pdist2 vs. mahal in matlab
% the purpose of this script is to see whether the average over the rows of E equals the values in d...
% data X = []; % 50*3 matrix, data omitted Y = []; % 60*3 matrix, data omitted
% calculations S = nancov(X);
% mahal() d = mahal(Y,X); % gives an 60*1 matrix with a value for each Cartesian element in Y (second matrix is always the reference matrix)
% pairwise mahalanobis distance with pdist2() E = pdist2(X,Y,'mahalanobis',S); % outputs an 50*60 matrix with each ij-th element the pairwise distance between element X(i,:) and Y(j,:) based on the covariance matrix of X: nancov(X) %{ so this is harder to interpret than mahal(), as elements of Y are not just compared to the "mahalanobis-centroid" based on X, % but to each individual element of X % so the purpose of this script is to see whether the average over the rows of E equals the values in d... %}
F = mean(E); % now I averaged over the rows, which means, over all values of X, the reference matrix
mean(d) mean(E(:)) % not equal to mean(d) d-F' % not zero
% plot output figure(1) plot(d,'bo'), hold on plot(mean(E),'ro') legend('mahal()','avaraged over all x values pdist2()') ylabel('Mahalanobis distance')
figure(2) plot(d,'bo'), hold on plot(E,'ro') plot(d,'bo','MarkerFaceColor','b') xlabel('values in matrix Y (Yi) ... or ... pairwise comparison Yi. (Yi vs. all Xi values)') ylabel('Mahalanobis distance') legend('mahal()','pdist2()')