Normalization of probability distribution function

Question

Jorge Fernandez 2023-4-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1952464-normalization-of-probability-distribution-function

评论： Torsten 2023-4-25

I'm trying to obtain a probability distribution curve with an area equal to 1. I have a dataset of about 3 million values ranging from 0 to 3.5 but this range changes depending on other input parameters irrelevant to my question. I'm basically trying to assign probabilities from 0 to 1 based on experimental data to later apply a Monte Carlo but for some reason I can't figure out why the area under my distribution curve is much larger than 1 and I can't seem to find a way to normalize it. Here is my code pertaining the issue and a snippet of the figure I obtain. Thank you so much in advance to anyone that helps.

EDIT: The value for the variable "area" is actually 1, I just can't seem to visualize how the probabilities amount to 1 in the y-axis.

% Compute the kernel density estimation for electron energy distribution

% from main ion

[f,xi] = ksdensity(Electron_info(:,2));

% Compute the area under the curve

area = trapz(xi,f);

% Normalize the kernel density estimate by the area

f_norm = f/area;

% plot the KDE curve

figure(1)

plot(xi, f_norm,'r')

hold on

% Set plot properties

legend('Ion excitation')

xlabel('Electron Energy (eV)')

ylabel('Probability Density')

title('Electron Energy Distribution')

xlim([0 3.5])

set(gcf, 'Color','w')

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

the cyclist 2023-4-25

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1952464-normalization-of-probability-distribution-function#answer_1222399

It might help if you uploaded your data, so that we can run your code.

To me, eyeballing your curve does look like it has area 1. It's a little bit of guesswork to try to figure out why you don't perceive that. Is it because you have a peak near 0.8? Remember, that peak is sharp and narrow -- running over a range of x from about 2.6 to 3 (and not all the y values there are as high as 0.8). That peak contributes perhaps about 0.25 to the area.

The broad, flattish region contributies about 0.3*(2.5-0.5) = 0.6.

The left peak contributes about 0.3*0.5 = 0.15.

I see no problem.

2 个评论
显示无隐藏无

Jorge Fernandez 2023-4-25

First off thank you for the answer and second you are correct. What I was trying to do was to obtain a function where the values of y correspond to the probability of finding a value of x. i.e. If I look at x = 5 and see where this value intercepts with the function, the corresponding y would be the probability.

the cyclist 2023-4-25

编辑：the cyclist 2023-4-25

在 MATLAB Online 中打开

For continuous functions, the probability of getting any exact, individual point (e.g. x=5) is zero. This can be a tricky point to grasp at first. It might help to realize to that there are an infinite number of x values, so if the each had a finite probability, then the total probability would be infinite.

Instead, you use the probability density function (which is what you have), and estimate the probability of a range of points, but using the area under the probability density.

If you have a discrete function, then you could plot the probablity itself, such as

x = [1 2 3];

p = [0.2 0.5 0.3];

bar(x,p)

请先登录，再进行评论。

Answer 2

Torsten 2023-4-25

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1952464-normalization-of-probability-distribution-function#answer_1222409

编辑：Torsten 2023-4-25

在 MATLAB Online 中打开

I think the kernel density is already normalized ...

From the documentation:

[f,xi] = ksdensity(x) returns a probability density estimate, f, for the sample data in the vector or two-column matrix x. The estimate is based on a normal kernel function, and is evaluated at equally-spaced points, xi, that cover the range of the data in x. ksdensity estimates the density at 100 points for univariate data, or 900 points for bivariate data.

If you want to see the cumulated area under the curve, use

[f,xi] = ksdensity(Electron_info(:,2),'Function','cdf');

2 个评论
显示无隐藏无

Jorge Fernandez 2023-4-25

First off thanks for the answer. You are indeed correct, however what I'm trying to plot (if possible) was to obtain a function where the values of y correspond to the probability of finding a value of x. i.e. If I look at x = 5 and see where this value intercepts with the function, the corresponding y would be the probability.

Torsten 2023-4-25

The probability density function gives information about the probability for an interval of x-values. The probability to get a single x-value for a continuous distribution is always 0.

请先登录，再进行评论。