Generating random samples from a 2D space matching the probability density function estimated from a discrete set of data
    14 次查看(过去 30 天)
  
       显示 更早的评论
    
I have a set of around 12k points in the {a,e} space that looks as in the figure below (yes, there are 12 thousand points, but most of them are concentrated in the bottom left). I have to extract around 600k random points from this space, and I want the resulting set to match the hypothetical 2D probability density function that has led to the initial set of 12k points. ksdensity can estimate the 2D pdf, and I can extract random samples accordingly just as this thread suggests, using randsample, but the problem is that the set of samples will be limited to the discrete points in which I meshed the domain. I could work with, for instance, a 5000x5000 mesh, but this is quite CPU-consuming and I think it leads to considerable overfitting. I wonder if there is any kind of analytical alternative that makes it easier and allows to work in the continuum. This thread suggests something but tbh I don't think it is well justified. Thanks in advance!

0 个评论
回答(1 个)
  Vinayak
      
 2024-5-17
        Hi Lluc-Ramon,
If you need to generate a lot of random sample points that match a 2D probability density function (pdf) derived from an initial set of points, I suggest using Kernel Density Estimation (KDE) in MATLAB. This approach creates a smooth, continuous approximation of the density, allowing you to sample efficiently without the limitations of a discretized mesh as in case of “ksdensity” and “randsample”.
% Generate synthetic data using ChatGPT based on your image
N = 12000;
a_values = [logspace(0, 4, round(0.85 * N)), logspace(4, 5, round(0.15 * N))]';
% Generate corresponding `e_values` with some correlation
e_values = 0.9 * rand(N, 1) .* (log10(a_values) - 0.5);
e_values(e_values < 0) = 0;  % Filter out negative values
e_values = e_values / 4;     % Normalize `e_values`
% Plot the original data
figure;
scatter(a_values, e_values);
axis xy;
xlabel('a');
ylabel('e');
% Perform KDE
[bandwidth, density, X, Y] = kde2d([a_values, e_values]);
% Plot the estimated density
figure;
scatter(a_values, e_values);
axis xy;
xlabel('a');
ylabel('e');
title('Estimated 2D PDF');
function samples = sample_kde(bandwidth, X, Y, density, num_samples)
cdf = cumsum(density(:)) / sum(density(:));  % Normalize CDF
random_values = rand(num_samples, 1);
sample_indices = arrayfun(@(x) find(cdf >= x, 1), random_values);
[row, col] = ind2sub(size(density), sample_indices);
a_samples = X(1, col)';
e_samples = Y(row, 1);
samples = [a_samples, e_samples];
end
num_samples = 600000;  % Number of samples to generate
samples = sample_kde(bandwidth, X, Y, density, num_samples);
% Extract and plot sampled points
a_samples = samples(:, 1);
e_samples = samples(:, 2);
figure;
scatter(a_samples, e_samples, 1, 'filled');
xlabel('a');
ylabel('e');
title('Random Samples from Estimated 2D PDF');
This approach ensures that the data closely follows the desired distribution pattern.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


