How to average multiple convex hulls? (or, bootstrapping with convhullln)

6 次查看(过去 30 天)
I'm trying to bootstrap a 6-dimensional cluster outline that describes 90% of my data for a group of data points.
The idea is basically this: 1) subsample 90% of the data and calculate a convex hull with convhulln(); 2) repeat step 1 many times; 3) calculate an average hull; 4) do cool analyses with this outline
Unsurprisingly, the size of the each generated hull is a bit different. I've tried to interpolate some extra points to match the hull with the largest number of 'corners' and then tried to calculate an average hull from there (see code below). This doesn't work: I think this is mostly because my hull-corners aren't actually alligned in any useful way. The resulting median outline doesn't end up matching the original data well at all.
Is there a way to calculate an average hull from a group of N-dimensional outlines/hulls? I'd be grateful for any pointers in a helpful direction; my googling has so far been unsuccessful.
My real data has is in a 6-dimensional space, but here I'm trying to solve the problem in 3D (I find it a bit easier to follow). A solution would have to work in a multidimensional space, though.
samp_frac = 0.9; %take 90% of samples
%% make some data
PC1Band = randn(543,3);
%% generate 1000 medians and outlines for each cluster
ss_hulls_idx = cell(1,500);
ss_hulls_vals = cell(1,500);
for o = 1:500
[thesevals,theseidx] = datasample(PC1Band,floor(length(PC1Band)*samp_frac),1,'Replace',false);
[thishull,~] = convhulln(thesevals);
thesecorners = unique(thishull); %indexes of corner samples in this subset
ss_hulls_vals{o} = thesevals(thesecorners,:);
ss_hulls_idx{o} = theseidx(thesecorners); %indexes in original PC1Band
end
%% adjust length of outlines to match longest outline
Lens_outl = cellfun(@length,ss_hulls_vals); %outline lengths
MaxOutline = max(Lens_outl,[],2); %length of longest outline
%for each shorter outline, add some 'missing' values, then interpolate
%those. This shouldn't change the outline shape (facets are also linear)
interp_outlines = zeros(MaxOutline,3,500);
for o = 1:500
thisOutlineVals = ss_hulls_vals{o}; %values at corner indexes
thisOutlineIdx = ss_hulls_idx{o};
newOutlineVals = zeros(MaxOutline,3); %placeholder vector for upsampled corner values
thisLen = Lens_outl(o); %nr of corners in this outline
MissVal = MaxOutline - thisLen; % nr of missing values in this outline
if MissVal > 0 %skip outlines with correct length
MissVali = randperm(MaxOutline-2,MissVal);
MissVali = MissVali + 1; % exclude first and last value (for interpolation reasons)
MissVali = sort(MissVali);
%replace placeholder values at MissVali with NaN
for m = 1:MissVal
newOutlineVals(MissVali(m),:) = [NaN,NaN,NaN];
end
% fill not-missing values
newOutlineVals(~isnan(newOutlineVals)) = thisOutlineVals; % non-missing values are thisDimVals
% linear interpolate NaNs
interp_Outline = fillmissing(newOutlineVals,'linear');
end
interp_outlines(:,:,o) = interp_Outline;
end
%% calculate median outlines
% math works fine, outputs are stupid :(
ss_median_outline = median(interp_outlines,3);
ss_std_outline = std(interp_outlines,1,3);
  3 个评论
Susan Leemburg
Susan Leemburg 2023-1-12
The idea is to get a hull that doesn't exactly fit all of my original datapoints, but that gives an outline that would represent a likely 90% of the data of this particular set.
I understand that an average hull of those two rotates square will not repesent either square completely, but it should overlap with at least part of both (assuming that they're otherwise more or less in the same location).
But what I'm currently doing gives me hulls/outlines that end up not matching my original dataset at all. Nor does the average hull remotely resemble the individual hulls. So, I'm clearly doing something very wrong.
I thought that simply having all corners of a hull would be sufficient, but I can also try to keep all facets instead. Problem is that I also don't know how to go about calculating a correct average hull with those.
Bjorn Gustavsson
Bjorn Gustavsson 2023-1-12
Would thinking about this something like this help (In 2-D first, I'm scared about the cursed monster of dimensionality):
Set all points inside each convex hull to one, on suitably fine-scaled discretizations,
Add those binary images together, possibly smooth for "taste",
Take the contour at half the max of the sum as an "average hull".
It is not entirely clear to me what you'd want to get from this for the case with very poorly overlapping hulls, but I can at least imagine that if there are many hulls that vary randomly a little bit around some centre-curve this might give something somewhat sensible.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Bounding Regions 的更多信息

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by