Function 'pdf' doesn't return pdf values
显示 更早的评论
I have a problem with the function pdf. I have this code:
estim_KDE = fitdist(data, 'kernel');
x = low:(abs(low-high)/(obs-1)):high;
y = pdf(estim_KDE,x);
plot(x,y,'r'), xlabel('xxx'), ylabel('yyy'),...
title('title'), legend('xyz');
but the function pdf returns values that have no sense for me: not comprised between 0 and 1, nor numbers between zero and one multiplied by the length of x (one of this two options is what i expected from the function pdf); for example: it gives me numbers like 20.something or 5.something, with length(x) = 1000 or more, numbers that have no sense for me. This happens for all the distributions i tried to have the pdf (always by the function fitdist). I discovered this problem only because i have plotted an histogram of the frequencies versus the Kernel Density Estimator.
Can someone help me, please?
回答(2 个)
John D'Errico
2015-2-6
编辑:John D'Errico
2015-2-6
I think you are under a common misperception about the PDF of a random variable. My guess is it is because of the letter P in PDF that confuses people, and yes, it is called a Probability Density Function.
The thing is, it does not actually return a probability. Consider a PDF with a very narrow spread. Here, a Gaussian with mean 0 and std deviation of 0.001.
normpdf(0,0,.001)
ans =
398.94
See that the PDF at 0 is 398.94, vastly larger than 1.
What matters is that the PDF integrates to 1. The integral of that function over the domain is 1.
It is the CDF that actually returns something you can interpret as a probability. Or, you can form the integral of the PDF to compute a probability. That is what the CDF gives you though.
4 个评论
simo borto
2015-2-6
John D'Errico
2015-2-6
I'm not sure what you mean by the density of the probabilities. The PDF is apparently the function you are looking to plot.
ezplot(@(x) normpdf(x,0,.1),[-.5,.5])
grid on

But as you can see, the values it is plotting are not probabilities. They are clearly greater than 1 in places. This reflects the fact that the probability that you will get exactly any specific value, such as x=0.1, is zero. That event has measure zero. However, the area under that curve is 1, as it must be. Regardless, if you are looking to see a plot that shows the "probability" you will see exactly any specific value from a continuous distribution like this, that plot does not exist, nor can it exist. Don't forget the words measure zero.
You can talk about the probability you will see some value within a range of numbers. This is what the CDF tells you. So maybe you are looking to plot a CDF, possibly of some empirically derived distribution. I'm still not sure what it is you want to see.
simo borto
2015-2-7
编辑:simo borto
2015-2-8
John D'Errico
2015-2-10
A plot of the PDF IS a graph of the relative frequency, to the extent that this makes any sense. Why do you care about the y-axis scaling? If that is what bothers you, then just turn off the y-axis labels.
The fact is, you CAN create a histogram, of the frequency in each "bin". You would do this by either an integration of the PDF over that sub-interval, or by subtracting successive values of the CDF, to get the relative fraction that would occur in that bin.
If you used a tiny enough bin interval, then the curve would look very nice and smooth. But the probability of a point falling in any single such tiny bin would be vanishingly small. So the y-axis scaling would be all tiny numbers. This reflects the fact that any single number has probability ZERO of arising.
So, just plot the PDF, and don't worry about the y-axis, or turn it off completely.
Rob Keeton
2019-9-3
0 个投票
Multiply by the bandwidth of the pdf.
y = pdf(estim_KDE,x)*;estim_KDE.BandWidth;
类别
在 帮助中心 和 File Exchange 中查找有关 Half-Normal Distribution 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!