AUC between different size curves
3 次查看(过去 30 天)
显示 更早的评论
How can I calculate the area between 2 curves of unequal data size, as in the photo?
0 个评论
采纳的回答
Julius Muschaweck
2021-9-7
I would use interp1 to interpolate both functions so they have the same high resolution x support. You can easily replace the 0.01 step by a 1e-6 step size in the code below.
Look for interp1 options, to control the interpolation algorithm (you might prefer 'linear' to avoid overshoots), and to control extrapolation (in your probability distribution case, you might want to set the value for extrapolation to 1 if one of your distributions has shorter x range).
Then use trapz to compute the signed integral of the difference or unsigned area between the curves. See code.
x1 = 0:0.2:1.4;
y1 = x1.^2;
x2 = 0:0.14:1.4;
y2 = sqrt(x2);
figure();
hold on;
plot(x1,y1,'-x');
plot(x2,y2,'-x');
xq = 0:0.01:1.4;
y1q = interp1(x1,y1,xq);
y2q = interp1(x2,y2,xq);
my_signed_integral = trapz(xq,y2q-y1q)
my_unsigned_area = trapz(xq,abs(y2q-y1q))
test_signed = trapz(xq,sqrt(xq) - xq.^2)
test_unsigned= trapz(xq,abs(sqrt(xq) - xq.^2))
11 个评论
Julius Muschaweck
2021-9-11
The trouble in your example is that you extrapolate y1i linearly to the right, which makes the extrapolated values grow to almost 4. This is not what you want. You have cumulative distributions, which are inherently zero on the left, all the way to minus infinity, and one to the right, all the way to plus infinity. So you should extrapolate to the right with constant values of 1, and to the left with constant values of zero.
You run into this problem no matter if you use my initial idea or Walter's actually superior suggestion of using union().
Like this:
clear;
load("bins.mat")
load("cdist.mat")
load("Zbins.mat")
load("Zcdist.mat")
%%
% I would try to control the step size in your xq array
dx = 0.01; % some small step size to decrease discretization error
xmin = min([Zbins';bins']);
xmax = max([Zbins';bins']);
nsteps = ceil((xmax-xmin) / dx) + 1;
xq = linspace(xmin, xmax, nsteps);
% but now that we're at it, let's try also what Walter Roberson suggested:
xq = union(Zbins, bins, "sorted");
% xq = linspace(min([Zbins';bins']), max([Zbins';bins']))'; % New %x Vector For Interpolation
% i check max and min in both and get the max and min among the 2 sets, so I thought this covers the
% whole data range
% extrapolation does a bad job for you here. Look at the values in y1i: They go up to almost 3.9 !
%y1i = interp1(Zbins', Zcdist', xq, 'linear','extrap'); % Interpolate To %xqT
%y2i = interp1(bins', cdist', xq, 'linear','extrap'); % Interpolate To %xq%
%my_unsigned_area = trapz(xq,abs(y2i-y1i))
% without 'extrap', interp1 "extrapolates" to NaN (not a number)
y1i = interp1(Zbins', Zcdist', xq, 'linear'); % Interpolate To %xqT
y2i = interp1(bins', cdist', xq, 'linear'); % Interpolate To %xq%
%now set "outliers" on the left to 0, and on the right to 1
% logical indexing is the non-for-loop way to do this:
idx1_left = xq < Zbins(1); % assuming Zbins and bins are strictly ascending
y1i(idx1_left) = 0;
idx1_right = xq > Zbins(end);
y1i(idx1_right) = 1;
idx2_left = xq < bins(1); % assuming Zbins and bins are strictly ascending
y2i(idx2_left) = 0;
idx2_right = xq > bins(end);
y2i(idx2_right) = 1;
% check if we caught all outliers
assert(~any(isnan(y1i)));
assert(~any(isnan(y2i)));
my_unsigned_area = trapz(xq,abs(y2i-y1i))
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!