How to make multiple lines of best fit into one scatter graph

9 次查看(过去 30 天)
I have a big number of Data, I want to be able to automatically filter the sets of Data to make different lines of best fit ( trendlines) that can intersect with each other to get the intersection points. Then I need to be able to automatically calculate the slope of each line. This is the Code I got so far.
The picture shows that result I want ( which i did manually just to show you an explanation) and the result I was able to get from Matlab and I will attach the code here for you to see what I got so far. Is it even possible to do this using Matlab ?
clc
clear all
x=[5705 5690 5671 5667 5604 5585 5555 5542 5502 5501 5495]
y=[12644 12612 12570 12560 12420 12361 12278 12240 12098 12078 12005]
p=polyfit(x,y,1);
px=[min(x) max(x)];
py=polyval(p,px)
scatter(x,y, "filled")
set(gca,'YDir','reverse')
lsline

回答(2 个)

John D'Errico
John D'Errico 2025-2-26
编辑:John D'Errico 2025-2-26
Is it possible to do in MATLAB? Of course it is. It of course depends on the skill of the person writing the code. It depends on the signal to noise ratio. High noise problems will be problematic for any code. And any heuristic you devise will fail on SOME set of carefully chosen data. It depends on your requirements for doing this automatically. Should the code be able to know automatically how many lines to fit? Again, if you say yes to that, then I can easily devise a set of data that will cause insolvable problems. Sometimes that will be simple, but not always.
And whether this can be done in MATLAB is a silly question. (Sorry, but it is.) MATLAB is just a programming language. It is the skill of the programmer that matters, NOT the language. How robust is the algorithm depends on the programmer, their understanding of the problem, and their knowledge of statistics, of numerical methods, etc. And of course, it depends on how well the programmer understands the data which they will be seeing. How much noise should they expect?
x=[5705 5690 5671 5667 5604 5585 5555 5542 5502 5501 5495];
y=[12644 12612 12570 12560 12420 12361 12278 12240 12098 12078 12005];
plot(x,y,'o')
When I look at that plot, I might see one line that can be fit. With a little more care, I might decide the bottom three points seem to follow a different slope from the rest. But is there a third segment up high? Perhaps. How noisy is the data? I don't know. Only you know that.
Anyway, what might I do?
Just looking at the plot of the data, we might make the decision from that picture that the first three points belong to one group. Then the next 4 points seem to form a cluster with a common slope. Finally, the last three points might have a slightly lower slope, and they MIGHT fall on a different line.
One simple trick is to simply compute the slope of the line segments between each consecutive pair of points. If we do so, we see this:
[xsort,tags] = sort(x);
ysort = y(tags);
segslopes = diff(ysort)./diff(xsort) % slope between each consecutive pair of points
segslopes = 1×10
12.1667 20.0000 3.5500 2.9231 2.7667 3.1053 2.2222 2.5000 2.2105 2.1333
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
xmid = conv(xsort,[1/2,1/2],'valid') % midpoints of each segment in x
xmid = 1×10
1.0e+03 * 5.4980 5.5015 5.5220 5.5485 5.5700 5.5945 5.6355 5.6690 5.6805 5.6975
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
plot(xmid,segslopes,'o')
However, if I look at this plot, it becomes clear that if we consider the variability of the slopes in blocks 2 and 3, then compare to the variability in slopes in that first "block" I might decide that we actually might have 4 lines, NOT 3 to consider. Which is it?
Any heuristic you will write will suffer from exactly these issues. How many lines are there to be found?
The comomon tool to estimate such a model is called a broken stick regression. But even there, we can find issues. For example, I used a tool of my own creation to fit that data. You can find it on the file exchange, as my SLM toolbox. Here is what it did though:
Even though I told it to break the curve into three segments, then decide where the breaks should go based on your data, do you see it made a decision that is not the same as what I first guessed? The problem is, those first three points just don't fall on a straight line very well.
  1 个评论
Reem
Reem 2025-2-26

Thank you for your answer and effort, it was very informative. It is just that I want to automatically get the data like the graph attached automatically, but still unsure how to achieve it. I will try to work with the info you provided. I think it could work if we reverse the y axis.

请先登录,再进行评论。


Image Analyst
Image Analyst 2025-2-27
See my attached demo. It does a piecewise linear fit over two sections, finding the best splitting point. You can adapt it to work with 3 or more sections if you want.

类别

Help CenterFile Exchange 中查找有关 Get Started with Curve Fitting Toolbox 的更多信息

产品


版本

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by