Why is cumsum producing CDF>1 values from a user defined distribution?

Question

dandan 2018-1-18

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/377857-why-is-cumsum-producing-cdf-1-values-from-a-user-defined-distribution

编辑： dandan 2018-1-19

I am trying to calculate my own cumulative distribution using cumsum (because I want to be able to use it on a truncated distribution and eventually normalize by N+1 instead of N in later use with empirical data), but am getting values greater than 1. My PDF is an Generalized Pareto distribution with empirically fit k, sigma and theta parameters. If I then generate a regularly spaced PDF from these parameters and calculate the CDF using cumsum, it exceeds 1 around x=21.095. By comparison, a CDF computed with Matlab's built-in gpcdf function does not exceed 1.

dx=0.001;
x = 0.01:dx:50;
%Generalized Pareto distribution parameters
k=0.435028746060392;
sigma=0.484904212893508;
theta=0.009999999999939;
%Make PDF and CDFs
paretoPDF=gppdf(x,k,sigma,theta); %generate generalized Pareto PDF
CDF1=gpcdf(x,k,sigma,theta); %CDF1: built-in Matlab function from generalized Pareto parameters
CDF2=cumsum(paretoPDF).*dx; %CDF2: manually calculate CDF by summing PDF area
find(CDF1>1,1,'first')
find(CDF2>1,1,'first')

If I use the cumsum function as intended on a truncated distribution, it has a similar problem compared to using the built-in cdf function on a truncated probability distribution object:

% Anonymous functions to make truncated distribution
pdfgp=@(x,k,sigma,theta)gppdf(x,k,sigma,theta);
cdfgp=@(x,k,sigma,theta)gpcdf(x,k,sigma,theta);
pdtrunc=@(pdffunc,cdffunc,x,xtrunc,varargin)pdffunc(x,varargin{:})./cdffunc(xtrunc,varargin{:});
xmax=42; %truncation point
%Make CDF from truncated probability distribution object (built-in Matlab functions)
pdt=truncate(makedist('GeneralizedPareto','k',k,'sigma',sigma,'theta',theta),0,xmax);
paretoPDFt1=pdf(pdt,x);
Ft1=cdf(paretoPDFt1,x);   
%Manually calculate CDF by summing truncated PDF area
paretoPDFt2=pdtrunc(pdfgp,cdfgp,x,xmax,k,sigma,theta);
Ft2=cumsum(paretoPDFt2).*dx);

Again, Ft2 goes >1, while Ft1 does not.

Am I making some stupid mistake here, or is there some less obvious error I need to address? What's going on? Thanks!

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Walter Roberson 2018-1-19

trapz() would be more accurate than cumsum()

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jeff Miller 2018-1-19

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/377857-why-is-cumsum-producing-cdf-1-values-from-a-user-defined-distribution#answer_300792

在 MATLAB Online 中打开

I think the problem is that the pdf is decreasing. Note that cumsum(paretoPDF).*dx approximates each segment of the PDF with the width of the segment times the height of the PDF at the lower end of the segment, which is greater than the average height of the segment.

Note that the problem gets much worse if you increase dx (say .1) and the problem gets much smaller if you decrease dx (say .00001).

You can get a much better approximation with your current dx if you evaluate the PDF at the midpoints of the intervals with something like this:

paretoPDF=gppdf(x-dx/2,k,sigma,theta); %generate generalized Pareto PDF

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Why is cumsum producing CDF>1 values from a user defined distribution?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Why is cumsum producing CDF>1 values from a user defined distribution?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论