How to exclude data when fitting an exponential distribution

4 次查看(过去 30 天)
I am trying to fit an exponential distribution of lifetimes. I want to exclude all lifetimes <= 1 because those represent unreliable data points. However, the fitter will then include the fact that there are zero data points in that region, rather than ignoring it. This becomes clear when I simulate a basically perfect exponential distribution:
rng('default')
x = round(exprnd(4,1e6,1)); % exponential distribution with mean 4
pd = fitdist(x,'exponential');
disp(pd.mu) % the fitted mean
x1 = x(x>1); % remove all values <= 1
pd1 = fitdist(x1,'exponential');
disp(pd1.mu) % the new fitted mean
Output:
3.9834
5.5111
The two fits are clearly different even though they are fitting the same data. How can I make the fitter ignore that range of values?
  3 个评论
J. Alex Lee
J. Alex Lee 2020-10-22
i don't think this is surprising at all...you aren't fitting a distribution to a histogram and ignoring probability densities. you are fitting to actual data. so if you alter the data, of course you will alter the fits...it's like being surprised at the difference between
x = randn(1000,1);
mean(x)
x1 = x(x>1)
mean(x1)
Sjoerd Nooteboom
Sjoerd Nooteboom 2020-10-22
@J. Alex Lee thanks, I understand. I suppose a possible solution would be to create the histogram data first and fit an exponential function to that in a separate line. I was just wondering if there was a simpler way...

请先登录,再进行评论。

采纳的回答

Jeff Miller
Jeff Miller 2020-10-22
You can make use of the memory-less property of the exponential here--the mean remaining time is independent of how much time has already passed, so just reset the clock to 0 after excluding times less than 1. Rounding messes that up though--I'm not sure why.
rng('default')
% x = round(exprnd(4,1e6,1)); % exponential distribution with mean 4
x = exprnd(4,1e6,1); % exponential distribution with mean 4
pd = fitdist(x,'exponential');
disp(pd.mu) % the fitted mean
x1 = x(x>1); % remove all values <= 1
x1 = x1 - 1; % ADJUST THE SCORES TO "RESTART THE CLOCK" AT TIME 1
pd1 = fitdist(x1,'exponential');
disp(pd1.mu) % the new fitted mean
% output:
% 3.994
% 3.9924
  1 个评论
Sjoerd Nooteboom
Sjoerd Nooteboom 2020-10-23
That's a nice solution, thanks! I think the rounding issue occurs because there are more values between 0.5-1.5 than between 0.0-0.5. Interestingly, the resulting mean from this method is always exactly 1 less than if one removes the 'clock resetting' line, no matter whether you round the data or not, and also for my experimental data. Not sure why this happens.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Descriptive Statistics and Visualization 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by