polyfit vs mldivide upon inversion of arrays

25 次查看(过去 30 天)
OK, this is probably a stupid question, but I'm seeing some unexpected output when trying to do a linear fit to two sets of data. When I invert the x/y axes, the two fits are very different:
linearfitcoeff=polyfit(x,y,1)
linearfitcoeff =
0.9600 6.1664
>> linearfitcoeff=polyfit(y,x,1)
linearfitcoeff =
0.7659 177.7362
Data sets are large (over 75000 data points each). This is not what I would expect from a least-squares fit. Can someone explain this behaviour to me?
Update: mldivide works as expected, but I'm still curious about why polyfit doesn't
Thanks,
Matthew

采纳的回答

John D'Errico
John D'Errico 2017-7-25
编辑:John D'Errico 2017-7-25
Sigh. This is in fact, EXTREMELY common. Exactly as expected. If there is any noise at all, swapping x and y will NOT produce a fit with an inverse slope. NOT. NOT. In fact, you will expect to see slope estimates with a bias when you swap the variables.
Lets make up some data.
x = rand(20,1);
y = x + randn(size(x))/5;
polyfit(x,y,1)
ans =
1.1032 -0.10593
polyfit(y,x,1)
ans =
0.67926 0.1779
So the first fit had a slope of 1.1. The second fit, if the slope was the inverse, would be roughly 0.9. But it was around 0.68.
Hmm. WHY? Why does this make complete sense? Why is it exactly the expected result? Here is the linear fit, essentially polyfit(x,y,1).
plot(x,y,'o')
The reason is leverage.
Look at the points I generated, completely at random. The errors were purely in y.
Leverage tells us that points near the center of the linear fit have very little impact on the estimate of the slope. It is the points near the extremes that sway the fit one way or another.
Now, suppose I have two points near the very end of the curve, at the bottom. To one of those points, I add some noise that is +delta (for some fixed delta.) To the other point, I subtract delta. I'll add two points in red so you can see the idea.
Since I added the same absolute noise, +delta and -delta to each y value, they will have relatively little influence on the estimated slope, when fit as polyfit(x,y,1).
Now, imagine I were to swap the axes and perform a straight line fit?
plot(y,x,'bo')
hold on
plot(ye,xe,'rs')
Look at the red point where I added delta. That point is now near the very middle of the data.
polyfit([y;ye'],[x;xe'],1)
ans =
0.63971 0.1755
Note that I added two points to my data fit, one high in y, the other low, and I got a result with a slope that was biased low from the previous fit in that direction.
What did I say before? Points near the middle have no impact on the slope. The point with noise that was low however is now a very high leverage point. It has a great deal of impact on the slope.
So what does this tell us? When you do a linear fit with data that has noise in y, then swap the axes and do another linear fit, so now the noise is in the wrong variable, it tends to produce a biased estimate of the slope. The new slope will tend to be biased LOW compared to the inverse of the original slope, because the high leverage points will drag the curve to have a lower slope than if you had fit the line the normal way.
So you saw EXACTLY what you should have expected to see. No surprise at all.
Of course, if you had no noise at all, then the line will be perfect, with the points falling exactly on the line. So no difference would now be expected. As the noise variance increases, the expected bias in the slope will grow, in a very predictable way.
  5 个评论
Matthew Lawrence
Matthew Lawrence 2017-7-25
Got it - it's clear that polyfit is not what I'm looking for. I am just looking for the line with the minimal distance from all points. Hence my confusion. Thanks!
John D'Errico
John D'Errico 2017-7-26
The orthogonal regression case CAN be solved. In fact, there are surely tools on the FEX that solve it. I took a look through my unpublished tools, and I wrote one long ago. Nothing fancy, just an orthogonal fit for a straight line. I never saw a reason to post it on the FEX since the method is pretty commonly used.
In the code below, the results will be essentially always identical if you flip x and y, in the sense that the slope will be the reciprocal. So if you start with
y = m*x+b
then flip x and y, we will have:
x = (1/m)*y + (-b/m)
Again, those results will be the same, except for floating point trash in the least significant bits. (Never test for exact equality of floating point numbers.)
Here is my code:
function [P1,meandist] = orthogonalRegress(x,y)
% Simple orthogonal regression between x and y
% usage: P1 = orthogonalRegress(x,y)
%
% x,y - vector input data
%
% P1 - linear (first order) polynomial fit to the data,
% in the form of [slope,intercept].
%
% meandist - measure of the residual error, as computed
% in the form of the mean of the orthogonal distances
% to the computed line.
% mean subtract
mux = mean(x);
muy = mean(y);
xhat = x - mux;
yhat = y - muy;
% use svd to do the regression
A = [xhat(:),yhat(:)];
[U,S,V] = svd(A,0);
% slope of the line is just the negative of the ratio of the components
% in the second singular vector, as found in V.
slope = -V(1,2)/V(2,2);
% recover the y-intercept
yintercept = muy - slope*mux;
% return the line in standard coefficient form
P1 = [slope,yintercept];
% the singular value from the svd itself will give us an error measure,
% but a better measure might be just the mean of the orthogonal
% distances to the line
OD = abs(y - P1(1)*x - P1(2))/sqrt(1 + P1(1)^2);
meandist = mean(OD);

请先登录,再进行评论。

更多回答(1 个)

Walter Roberson
Walter Roberson 2017-7-25
编辑:Walter Roberson 2017-7-25
There is no reason to expect that the the coefficients should be nearly the same.
Take the simple case of a straight line, y = m*x + b . Solve for x: x = y/m - b/m . Therefore even in the simple case of a straight line, the coefficients would not be expected to be the same: it would be like x = (1/m) * y + (-1/m) * b
Note: polyfit works internally by forming the Vandermonde matrix V = [x.^n ... x.^2 x ones(size(x))] and using mldivide (except that it takes one extra step to increase precision.)
  5 个评论
Matthew Lawrence
Matthew Lawrence 2017-7-25
But if I have a line which has the minimum sum of distances to all my data points, if I flip all data points AND the line about x=y, won't the distances to the line for all data points remain the same, and therefore won't the sum of distance remain a minimum?
John D'Errico
John D'Errico 2017-7-26
编辑:John D'Errico 2017-7-26
Draw a picture. As I said, polyfit does NOT minimize the distances to the line.
polyfit minimizes the sum of squares of VERTICAL distances to the line. This is very different.
When you swap x and y, polyfit still essentially minimizes vertical distances, but now they are what would before have been HORIZONTAL distances.
The point is that polyfit assumes the noise all lies in the second variable passed in, thus the dependent variable. y is the dependent variable in a regression. Polyfit assumes the independent variable (x) has no noise at all in it. Therefore the distance to the line is computed as a vertical distance.
Yes, I know that some may feel this is not fully intuitive. But that is how linear regression works. In these methods, it is assumed that you can control the value of x that is used. Thus an experiment. Then stuff happens, and you get out a value of y for each x. y is assumed to have some noise in that value. As I said, stuff happens.
You are thinking about the orthogonal regression case. In orthogonal regression, it is assumed that you measure the value of TWO variables. Call one x, the other y. There, both variables may be assumed to have some noise in them, some uncertainty in their value. In that case, it makes some amount of sense to minimize the orthogonal distances to the line, assuming both variables had noise in them. That is VERY different from the polyfit style of regression.
The two cases are really very different. So why does polyfit solve the vertical error problem? It does so because the problem becomes MUCH more difficult to solve for even a quadratic polynomial. Standard linear regression is trivial to compute for general polynomials if we assume the noise occurs only in y. It is difficult as hell for the orthogonal case on a quadratic or above.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by