polyfit vs mldivide upon inversion of arrays

Question

Matthew Lawrence 2017-7-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/350229-polyfit-vs-mldivide-upon-inversion-of-arrays

评论： John D'Errico 2017-7-26

OK, this is probably a stupid question, but I'm seeing some unexpected output when trying to do a linear fit to two sets of data. When I invert the x/y axes, the two fits are very different:

linearfitcoeff=polyfit(x,y,1)

linearfitcoeff =

0.9600 6.1664

>> linearfitcoeff=polyfit(y,x,1)

linearfitcoeff =

0.7659 177.7362

Data sets are large (over 75000 data points each). This is not what I would expect from a least-squares fit. Can someone explain this behaviour to me?

Update: mldivide works as expected, but I'm still curious about why polyfit doesn't

Thanks,

Matthew

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

John D'Errico 2017-7-25

3
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/350229-polyfit-vs-mldivide-upon-inversion-of-arrays#answer_275526

编辑：John D'Errico 2017-7-25

在 MATLAB Online 中打开

Sigh. This is in fact, EXTREMELY common. Exactly as expected. If there is any noise at all, swapping x and y will NOT produce a fit with an inverse slope. NOT. NOT. In fact, you will expect to see slope estimates with a bias when you swap the variables.

Lets make up some data.

x = rand(20,1);
y = x + randn(size(x))/5;
polyfit(x,y,1)
ans =
       1.1032     -0.10593
polyfit(y,x,1)
ans =
      0.67926       0.1779

So the first fit had a slope of 1.1. The second fit, if the slope was the inverse, would be roughly 0.9. But it was around 0.68.

Hmm. WHY? Why does this make complete sense? Why is it exactly the expected result? Here is the linear fit, essentially polyfit(x,y,1).

plot(x,y,'o')

The reason is leverage.

Look at the points I generated, completely at random. The errors were purely in y.

Leverage tells us that points near the center of the linear fit have very little impact on the estimate of the slope. It is the points near the extremes that sway the fit one way or another.

Now, suppose I have two points near the very end of the curve, at the bottom. To one of those points, I add some noise that is +delta (for some fixed delta.) To the other point, I subtract delta. I'll add two points in red so you can see the idea.

Since I added the same absolute noise, +delta and -delta to each y value, they will have relatively little influence on the estimated slope, when fit as polyfit(x,y,1).

Now, imagine I were to swap the axes and perform a straight line fit?

plot(y,x,'bo')
hold on
plot(ye,xe,'rs')

Look at the red point where I added delta. That point is now near the very middle of the data.

polyfit([y;ye'],[x;xe'],1)
ans =
      0.63971       0.1755

Note that I added two points to my data fit, one high in y, the other low, and I got a result with a slope that was biased low from the previous fit in that direction.

What did I say before? Points near the middle have no impact on the slope. The point with noise that was low however is now a very high leverage point. It has a great deal of impact on the slope.

So what does this tell us? When you do a linear fit with data that has noise in y, then swap the axes and do another linear fit, so now the noise is in the wrong variable, it tends to produce a biased estimate of the slope. The new slope will tend to be biased LOW compared to the inverse of the original slope, because the high leverage points will drag the curve to have a lower slope than if you had fit the line the normal way.

So you saw EXACTLY what you should have expected to see. No surprise at all.

Of course, if you had no noise at all, then the line will be perfect, with the points falling exactly on the line. So no difference would now be expected. As the noise variance increases, the expected bias in the slope will grow, in a very predictable way.

5 个评论
显示 3更早的评论隐藏 3更早的评论

John D'Errico 2017-7-25

编辑：John D'Errico 2017-7-25

NO NO NO NO NO NO!!!!!! That is not at all what polyfit does!

You are thinking about what is called orthogonal regression. Polyfit assumes the noise is entirely in y (the dependent variable.) This is true of backslash too. That is the assumption for ALL standard linear regression tools. Thus polyfit, lsqr, lscov, lsqlin, regress, backslash, etc. I'm sure I missed a few in that list.

So, when you start with a problem with noise in y, then swap the variables, it becomes a problem with noise in what polyfit thinks is now the INDEPENDENT variable. But polyfit still assume the noise is in what it sees as the dependent variable (the second input to polyfit.) Polyfit has no idea that you have swapped x and y.

When that happens you get a result with a biased slope estimate. The constant term may or may not be screwed up, but you should expect it to be wrong too.

In an orthogonal regression, the fit produced tries to project each point by the closest orthogonal distance to the line, assuming errors in BOTH x and y. That is not what polyfit does. In fact, if you have errors ONLY in y, then orthogonal regression is the wrong thing to do.

So now go back and read my (now) complete answer carefully. Think about what I've said.

Matthew Lawrence 2017-7-25

Got it - it's clear that polyfit is not what I'm looking for. I am just looking for the line with the minimal distance from all points. Hence my confusion. Thanks!

John D'Errico 2017-7-26

在 MATLAB Online 中打开

The orthogonal regression case CAN be solved. In fact, there are surely tools on the FEX that solve it. I took a look through my unpublished tools, and I wrote one long ago. Nothing fancy, just an orthogonal fit for a straight line. I never saw a reason to post it on the FEX since the method is pretty commonly used.

In the code below, the results will be essentially always identical if you flip x and y, in the sense that the slope will be the reciprocal. So if you start with

y = m*x+b

then flip x and y, we will have:

x = (1/m)*y + (-b/m)

Again, those results will be the same, except for floating point trash in the least significant bits. (Never test for exact equality of floating point numbers.)

Here is my code:

function [P1,meandist] = orthogonalRegress(x,y)
% Simple orthogonal regression between x and y
% usage: P1 = orthogonalRegress(x,y)
%
% x,y - vector input data
%
% P1 - linear (first order) polynomial fit to the data,
%      in the form of [slope,intercept].
%
% meandist - measure of the residual error, as computed
%      in the form of the mean of the orthogonal distances
%      to the computed line.
% mean subtract
mux = mean(x);
muy = mean(y);
xhat = x - mux;
yhat = y - muy;
% use svd to do the regression
A = [xhat(:),yhat(:)];
[U,S,V] = svd(A,0);
% slope of the line is just the negative of the ratio of the components
% in the second singular vector, as found in V.
slope = -V(1,2)/V(2,2);
% recover the y-intercept
yintercept = muy - slope*mux;
% return the line in standard coefficient form
P1 = [slope,yintercept];
% the singular value from the svd itself will give us an error measure,
% but a better measure might be just the mean of the orthogonal
% distances to the line
OD = abs(y - P1(1)*x - P1(2))/sqrt(1 + P1(1)^2);
meandist = mean(OD);

请先登录，再进行评论。

Answer 2

Walter Roberson 2017-7-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/350229-polyfit-vs-mldivide-upon-inversion-of-arrays#answer_275519

编辑：Walter Roberson 2017-7-25

There is no reason to expect that the the coefficients should be nearly the same.

Take the simple case of a straight line, y = m*x + b . Solve for x: x = y/m - b/m . Therefore even in the simple case of a straight line, the coefficients would not be expected to be the same: it would be like x = (1/m) * y + (-1/m) * b

Note: polyfit works internally by forming the Vandermonde matrix V = [x.^n ... x.^2 x ones(size(x))] and using mldivide (except that it takes one extra step to increase precision.)

5 个评论
显示 3更早的评论隐藏 3更早的评论

Walter Roberson 2017-7-25

在 MATLAB Online 中打开

Let us look at the fitting of a straight line with a small tweak to the X values:

syms m b mhat bhat
dx = sym('dx',[1 3]);
X = [-1 0 1];
Y = m*(X+dx)+b;
residue = sum((mhat * X + bhat - Y).^2);
bestmhat = simplify(solve(diff(residue,mhat),mhat));
residue_bhat = simplify(subs(residue,mhat,bestmhat));
bestbhat = simplify(solve(diff(residue_bhat,bhat),bhat));
bestmhat = simplify(subs(bestmhat, bhat, bestbhat));
residue2 = sum((mhat * Y + bhat - X).^2);
bestmhat2 = simplify(solve(diff(residue2,mhat),mhat));
residue_bhat2 = simplify(subs(residue2,mhat,bestmhat2));
bestbhat2 = simplify(solve(diff(residue_bhat2,bhat),bhat));
bestmhat2 = simplify(subs(bestmhat2, bhat, bestbhat2));

Under your hypothesis, bestbhat and bestbhat2 would have to be pretty close to equal to negative inverses of each other, but:

>> bestbhat, bestbhat2
bestbhat =
b + (dx1*m)/3 + (dx2*m)/3 + (dx3*m)/3
bestbhat2 =
-((dx3 - dx1 + 2)*(3*b + dx1*m + dx2*m + dx3*m))/(2*m*(dx1^2 - dx1*dx2 - dx1*dx3 - 3*dx1 + dx2^2 - dx2*dx3 + dx3^2 + 3*dx3 + 3))

... they aren't. Let's take out some of the complication by supposing that the first two points were exact and there was just a tweak to the third:

>> simplify(subs(bestbhat,dx(1:2),[0 0]))
ans =
b + (dx3*m)/3
>> simplify(subs(bestbhat2,dx(1:2),[0 0]))
ans =
-((3*b + dx3*m)*(dx3 + 2))/(2*m*(dx3^2 + 3*dx3 + 3))

Nope. There are some structural similarities, certainly, but the change is non-linear (and not just inverse linear) with changes in X.

You could run the same thing with tweaks to Y instead of tweaks to X, but your hypothesis requires that tweaks to Y would have the same effect as tweaks to X.

Matthew Lawrence 2017-7-25

But if I have a line which has the minimum sum of distances to all my data points, if I flip all data points AND the line about x=y, won't the distances to the line for all data points remain the same, and therefore won't the sum of distance remain a minimum?

John D'Errico 2017-7-26

编辑：John D'Errico 2017-7-26

Draw a picture. As I said, polyfit does NOT minimize the distances to the line.

polyfit minimizes the sum of squares of VERTICAL distances to the line. This is very different.

When you swap x and y, polyfit still essentially minimizes vertical distances, but now they are what would before have been HORIZONTAL distances.

The point is that polyfit assumes the noise all lies in the second variable passed in, thus the dependent variable. y is the dependent variable in a regression. Polyfit assumes the independent variable (x) has no noise at all in it. Therefore the distance to the line is computed as a vertical distance.

Yes, I know that some may feel this is not fully intuitive. But that is how linear regression works. In these methods, it is assumed that you can control the value of x that is used. Thus an experiment. Then stuff happens, and you get out a value of y for each x. y is assumed to have some noise in that value. As I said, stuff happens.

You are thinking about the orthogonal regression case. In orthogonal regression, it is assumed that you measure the value of TWO variables. Call one x, the other y. There, both variables may be assumed to have some noise in them, some uncertainty in their value. In that case, it makes some amount of sense to minimize the orthogonal distances to the line, assuming both variables had noise in them. That is VERY different from the polyfit style of regression.

The two cases are really very different. So why does polyfit solve the vertical error problem? It does so because the problem becomes MUCH more difficult to solve for even a quadratic polynomial. Standard linear regression is trivial to compute for general polynomials if we assume the noise occurs only in y. It is difficult as hell for the orthogonal case on a quadratic or above.

请先登录，再进行评论。