Line of Best Fit through Scattered Data
3 次查看(过去 30 天)
显示 更早的评论
I need to find the line of best fit through my scatterplot. I have attached my text file, and my code is the following.
clear all
fid = fopen( 'oo20.txt');
data = textscan(fid, '%f%f', 'Delimiter', '|', 'TreatAsEmpty','~');
fclose(fid);
GalList.year = data{1};
D = data{2};
X1 = GalList.year;
Y1 = D;
scatter(X1,Y1);
ylim([0 20])
3 个评论
Star Strider
2015-10-11
I just peeked at it and it seems reasonable to delete this one:
~|3.4028886
Does it have any specific significance?
采纳的回答
Star Strider
2015-10-11
I don’t see anything strange about the data, but the regression is failing. The data are both column vectors. With the full set of data, the parameters I estimate are both zero for the biparametric regression, and for the uniparametric (origin intercept) regression, the single parameter is zero. Trying it with polyfit results in both parameters being NaN, so it’s not my code. (I deleted the polyfit calls in the posted code.) It’s not obvious to me what problems there may be, but with three attempts failing, something is very wrong somewhere.
Of interest, everything works fine with a random sample of 280 data pairs, giving an intercept of 31 BCE (so that’s when astronomy began!), and a slope of +0.022 (is that Galaxies Discovered/Year?). Any more than 280 breaks the code for some reason.
I don’t believe the duplicated years should cause problems, since linear regression is usually robust to such. If you have any insights as to what the problem may be with your full data set, please share them. You know them better than I do, and what they should look like.
I plotted a linear fit tonight. Any others that might be more descriptive of whatever you’re observing that you’d like to try?
This is the end of my day, so I’ll come back to this in the morning.
My code:
fidi = fopen('jgillis16 oo20.txt', 'rt');
D = textscan(fidi, '%f|%f', 'CollectOutput',1, 'TreatAsEmpty','~');
X1 = D{:}(:,1);
Y1 = D{:}(:,2);
RandRows = randi(length(X1), 280, 1);
X1 = X1(RandRows); % Hypothesis: Works With Random Subset -> Accepted
Y1 = Y1(RandRows);
DesignMtx = [ones(size(X1)) X1]; % MODEL: X1*B = Y1
B2 = DesignMtx\Y1; % Linear Biparametric Regression — Estimate Parameters
Yhat2 = DesignMtx*B2; % Linear Biparametric Regression — Generate Line
B1 = X1\Y1; % Linear Uniparametric Regression — Estimate Parameter
Yhat1 = X1*B1; % Linear Uniparametric Regression — Generate Line
XTX = (DesignMtx'*DesignMtx); % X'X
figure(1)
scatter(X1, Y1, 'bp')
hold on
plot(X1, Yhat2, '-r')
hold off
grid
xlabel('Year')
ylabel('Distance (Parsecs)')
3 个评论
Star Strider
2015-10-12
My pleasure!
Extrapolating back to the x-intercept, the first galaxy discovered was in January 1409. It was undoubtedly the Milky Way, because it has a distance of zero (we’re in it).
Having fun with the numbers...
更多回答(2 个)
Image Analyst
2015-10-11
There are other more sophisticated methods, but try polyfit() and polyval(). See attached demo.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!