The calculated R squared is not equal to the squared of correlation coefficient by Matlab functions corr
47 次查看(过去 30 天)
显示 更早的评论
With model predicitons and true values, the R2 (determiantion coefficient) can be readily calculated using the standard formula:
Rsq = 1 - sum((ytrue - ypred).^2)/sum((ytrue - mean(ytrue)).^2)
Alternativley, the R square can be obtained by calculating the correlation coefficient, using buildin functions such as corr or corrcoeff:
Rsq = (corr(ytrue,ypred))^2
However, it is found the latter value is sligherly larger than the former. How does the build-in function give a higher value?
3 个评论
回答(2 个)
Ameer Hamza
2020-4-23
You are trying to find the coefficient of determination(R-squared). Whereas, as shown in the documentation of corr(): https://www.mathworks.com/help/releases/R2020a/stats/corr.html#d120e195813 it calculates Pearson's linear correlation coefficient. I am not sure if any MATLAB's built-in function supports its direct calculation, however, I found this submission on FEX: https://www.mathworks.com/matlabcentral/fileexchange/34492-r-square-the-coefficient-of-determination. Internally, it implements the same formula as you are using right now.
0 个评论
John D'Errico
2020-4-24
编辑:John D'Errico
2020-4-24
What I do not see is the actual model you used. Did you use a linear model? Was there a constant term in the model? The problem is, depending on the model, the claims you make about R^2 and the correlation coefficient are only valid for specific models.
x = rand(10,1);
>> y = rand(10,1);
>> p2 = polyfit(x,y,2);
>> pred = polyval(p2,x);
>> Rsq = 1 - sum((y - pred).^2)/sum((y - mean(y)).^2)
Rsq =
0.140274350649466
>> corr(y,pred).^2
ans =
0.140274350649466
So, the square of the correlation coefficient is the same as the value your formula computes. It matches down to the last digit, which is my expectation.
However, now try the same thing, but using a model that has no constant term in it. In this case, I'll use a cubic polynomial fit, but one that has no constant term. We can do that using backslash, though I could have done the fit using any number of tools.
mdl = [x,x.^2,x.^3]\y
mdl =
0.552026949387604
3.2235169295382
-3.50451900695301
>> pred = [x,x.^2,x.^3]*mdl;
>> Rsq = 1 - sum((y - pred).^2)/sum((y - mean(y)).^2)
Rsq =
0.195980323024559
>> corr(y,pred).^2
ans =
0.200698709640219
What was wrong? The error is in the assumption that the two ways compute the same thing for models that have no constant term estimated.
There are adjusted R^2 computations that can be more accurate in these cases, but even so, there is no expectation the formulas will give the same result any longer, when the model lacks a constant term.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!