Normalization of probability distribution function

24 次查看(过去 30 天)
I'm trying to obtain a probability distribution curve with an area equal to 1. I have a dataset of about 3 million values ranging from 0 to 3.5 but this range changes depending on other input parameters irrelevant to my question. I'm basically trying to assign probabilities from 0 to 1 based on experimental data to later apply a Monte Carlo but for some reason I can't figure out why the area under my distribution curve is much larger than 1 and I can't seem to find a way to normalize it. Here is my code pertaining the issue and a snippet of the figure I obtain. Thank you so much in advance to anyone that helps.
EDIT: The value for the variable "area" is actually 1, I just can't seem to visualize how the probabilities amount to 1 in the y-axis.
% Compute the kernel density estimation for electron energy distribution
% from main ion
[f,xi] = ksdensity(Electron_info(:,2));
% Compute the area under the curve
area = trapz(xi,f);
% Normalize the kernel density estimate by the area
f_norm = f/area;
% plot the KDE curve
figure(1)
plot(xi, f_norm,'r')
hold on
% Set plot properties
legend('Ion excitation')
xlabel('Electron Energy (eV)')
ylabel('Probability Density')
title('Electron Energy Distribution')
xlim([0 3.5])
set(gcf, 'Color','w')

采纳的回答

the cyclist
the cyclist 2023-4-25
It might help if you uploaded your data, so that we can run your code.
To me, eyeballing your curve does look like it has area 1. It's a little bit of guesswork to try to figure out why you don't perceive that. Is it because you have a peak near 0.8? Remember, that peak is sharp and narrow -- running over a range of x from about 2.6 to 3 (and not all the y values there are as high as 0.8). That peak contributes perhaps about 0.25 to the area.
The broad, flattish region contributies about 0.3*(2.5-0.5) = 0.6.
The left peak contributes about 0.3*0.5 = 0.15.
I see no problem.
  2 个评论
Jorge Fernandez
Jorge Fernandez 2023-4-25
First off thank you for the answer and second you are correct. What I was trying to do was to obtain a function where the values of y correspond to the probability of finding a value of x. i.e. If I look at x = 5 and see where this value intercepts with the function, the corresponding y would be the probability.
the cyclist
the cyclist 2023-4-25
编辑:the cyclist 2023-4-25
For continuous functions, the probability of getting any exact, individual point (e.g. x=5) is zero. This can be a tricky point to grasp at first. It might help to realize to that there are an infinite number of x values, so if the each had a finite probability, then the total probability would be infinite.
Instead, you use the probability density function (which is what you have), and estimate the probability of a range of points, but using the area under the probability density.
If you have a discrete function, then you could plot the probablity itself, such as
x = [1 2 3];
p = [0.2 0.5 0.3];
bar(x,p)

请先登录,再进行评论。

更多回答(1 个)

Torsten
Torsten 2023-4-25
编辑:Torsten 2023-4-25
I think the kernel density is already normalized ...
From the documentation:
[f,xi] = ksdensity(x) returns a probability density estimate, f, for the sample data in the vector or two-column matrix x. The estimate is based on a normal kernel function, and is evaluated at equally-spaced points, xi, that cover the range of the data in x. ksdensity estimates the density at 100 points for univariate data, or 900 points for bivariate data.
If you want to see the cumulated area under the curve, use
[f,xi] = ksdensity(Electron_info(:,2),'Function','cdf');
  2 个评论
Jorge Fernandez
Jorge Fernandez 2023-4-25
First off thanks for the answer. You are indeed correct, however what I'm trying to plot (if possible) was to obtain a function where the values of y correspond to the probability of finding a value of x. i.e. If I look at x = 5 and see where this value intercepts with the function, the corresponding y would be the probability.
Torsten
Torsten 2023-4-25
The probability density function gives information about the probability for an interval of x-values. The probability to get a single x-value for a continuous distribution is always 0.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by