Why is cumsum producing CDF>1 values from a user defined distribution?
4 次查看(过去 30 天)
显示 更早的评论
I am trying to calculate my own cumulative distribution using cumsum (because I want to be able to use it on a truncated distribution and eventually normalize by N+1 instead of N in later use with empirical data), but am getting values greater than 1. My PDF is an Generalized Pareto distribution with empirically fit k, sigma and theta parameters. If I then generate a regularly spaced PDF from these parameters and calculate the CDF using cumsum, it exceeds 1 around x=21.095. By comparison, a CDF computed with Matlab's built-in gpcdf function does not exceed 1.
dx=0.001;
x = 0.01:dx:50;
%Generalized Pareto distribution parameters
k=0.435028746060392;
sigma=0.484904212893508;
theta=0.009999999999939;
%Make PDF and CDFs
paretoPDF=gppdf(x,k,sigma,theta); %generate generalized Pareto PDF
CDF1=gpcdf(x,k,sigma,theta); %CDF1: built-in Matlab function from generalized Pareto parameters
CDF2=cumsum(paretoPDF).*dx; %CDF2: manually calculate CDF by summing PDF area
find(CDF1>1,1,'first')
find(CDF2>1,1,'first')
If I use the cumsum function as intended on a truncated distribution, it has a similar problem compared to using the built-in cdf function on a truncated probability distribution object:
% Anonymous functions to make truncated distribution
pdfgp=@(x,k,sigma,theta)gppdf(x,k,sigma,theta);
cdfgp=@(x,k,sigma,theta)gpcdf(x,k,sigma,theta);
pdtrunc=@(pdffunc,cdffunc,x,xtrunc,varargin)pdffunc(x,varargin{:})./cdffunc(xtrunc,varargin{:});
xmax=42; %truncation point
%Make CDF from truncated probability distribution object (built-in Matlab functions)
pdt=truncate(makedist('GeneralizedPareto','k',k,'sigma',sigma,'theta',theta),0,xmax);
paretoPDFt1=pdf(pdt,x);
Ft1=cdf(paretoPDFt1,x);
%Manually calculate CDF by summing truncated PDF area
paretoPDFt2=pdtrunc(pdfgp,cdfgp,x,xmax,k,sigma,theta);
Ft2=cumsum(paretoPDFt2).*dx);
Again, Ft2 goes >1, while Ft1 does not.
Am I making some stupid mistake here, or is there some less obvious error I need to address? What's going on? Thanks!
回答(1 个)
Jeff Miller
2018-1-19
I think the problem is that the pdf is decreasing. Note that cumsum(paretoPDF).*dx approximates each segment of the PDF with the width of the segment times the height of the PDF at the lower end of the segment, which is greater than the average height of the segment.
Note that the problem gets much worse if you increase dx (say .1) and the problem gets much smaller if you decrease dx (say .00001).
You can get a much better approximation with your current dx if you evaluate the PDF at the midpoints of the intervals with something like this:
paretoPDF=gppdf(x-dx/2,k,sigma,theta); %generate generalized Pareto PDF
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!