How to perform linear curve fitting based on distance from line instead of residuals

1 次查看(过去 30 天)
I already know how to perform a classic 1st order curve fir using the polyval function. However, I find that the polyval curve fit does accurately follow my current data because it has a high slope and a round trend. Based on the sum of square errors approach used by polyval, it seems to be over-weighing certain points. Is there a way to do a 1st order curve fit that is guided by the total distance of the points from the line, instead of based on residuals?

采纳的回答

John D'Errico
John D'Errico 2022-7-5
编辑:John D'Errico 2022-7-5
This is the classic total least square problem. Sometimes it is called the errors in variables problem, sometimes known as orthogonal regression.
Lacking your data, I can't show how to do it using your data. But it is not difficult to do. Assume you have data in the form of vectors x and y. I'll make some up, here:
t = rand(100,1);
x = t/10 + randn(size(t))/10;
y = 5*t + randn(size(t))/10;
plot(x,y,'o')
axis equal
The actual slope of the curve should have been 50. What happens when we use polyfit?
P1 = polyfit(x,y,1)
P1 = 1×2
2.9081 2.3460
The problem is, polyfit always produces biased estimates in these problems, where the slope will be biased towards zero. Do you see the polyfit extimated slope is very small? Instead of 50, we got a number around 2.9081 from polyfit.
The trick is to use total least squares. We can get the coefficients of the line from:
mux = mean(x);
muy = mean(y);
[~,~,V] = svd([x(:)-mux,y(:)-muy],0);
coef = V(:,end)
coef = 2×1
-0.9998 0.0176
AEst = -coef(1)/coef(2)
AEst = 56.7846
So despite the rather high presence of noise on our data, we get a slope estimate 56.7846, instead of the 2.9 value that polyfit predicted. That seems pretty reasonable.
The constant term in a regression model will be:
BEst = muy - AEst*mux
BEst = 0.0527
The line of best orthogonal fit is:
y = 0.0527 + 56.7846*x

更多回答(5 个)

Sulaymon Eshkabilov
You can try a scaling factor here. Your data seems to be within [355 365]; thus, take out 355 from x and then perform the linear fit.
You can also try fitlm() that is very similar to polyfit().

Torsten
Torsten 2022-7-5
编辑:Torsten 2022-7-5
The residuals are the sum of the vertical distances of the points from the line.
If you want to consider the orthogonal distances of the points from the line, you can proceed as follows:
Calculate the distance d_i of a point P = (x_i,y_i) from a line y = a*x+b. If you don't know how to do this: google is your friend.
Once you have this distance, you can formulate the optimization problem as
min_{a,b} sum_i d_i

Torsten
Torsten 2022-7-6
编辑:Torsten 2022-7-6
rng("default")
t = rand(100,1);
x = t/10 + randn(size(t))/10;
y = 5*t + randn(size(t))/10;
plot(x,y,'o')
sol = fminsearch(@(p)fun(p,x,y),[1 1])
sol = 1×2
54.4774 0.2832
fun(sol,x,y)
ans = 1.0309
hold on
plot(x,sol(1)*x+sol(2))
function res = fun(p,x,y)
a = p(1);
b = p(2);
for i = 1:numel(x)
xproj = (x(i)+a*(y(i)-b))/(1+a^2);
yproj = a*xproj+b;
dsquared(i) = (xproj-x(i))^2+(yproj-y(i))^2;
end
res = sum(dsquared);
end
  1 个评论
Torsten
Torsten 2022-7-6
编辑:Torsten 2022-7-6
rng("default")
t = rand(100,1);
x = t/10 + randn(size(t))/10;
y = 5*t + randn(size(t))/10;
mux = mean(x);
muy = mean(y);
[~,~,V] = svd([x(:)-mux,y(:)-muy],0);
coef = V(:,end)
coef = 2×1
-0.9998 0.0184
AEst = -coef(1)/coef(2)
AEst = 54.4774
BEst = muy - AEst*mux
BEst = 0.2832

请先登录,再进行评论。


Matt J
Matt J 2022-7-18

Image Analyst
Image Analyst 2022-7-18
In cases like this, with mostly vertical data, I just swap x and y and it seems to provide a good fit. Of course the residuals are horizontal rather than vertical but the fitted line seems reasonable. Of course you would only do this in certain situations where it physically makes sense, like fitting a line to the side of some blob in a binary image. You wouldn't do it for something like time series measurements. Of course you "orthogonal" residuals also would not have any theoretical reason for being right in that case either.

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

标签

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by