Function ecdf break down for large datasets

7 次查看(过去 30 天)
Hi,
I have a very large vector x (around 130 million elements). When I try to find the empirical cumulative distribution function of the values from that vector using MATLAB's command "ecdf(x)" the function breaks down. Its plot shows the ECDF for only the smaller values of x and doesn't even exist for bigger values of x. When I try to run the ecdf command on only a part of the vector (say 10 million elements), the results seem OK. Does anyone know what could be wrong with the ecdf function so that it breaks down in this manner for very large datasets?
Thank you very much for you help.
Martin

回答(1 个)

Mathieu Boutin
Mathieu Boutin 2011-9-8
Hi Martin. You could try my new homemade function and see if it works fine:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [v_f,v_x] = homemade_ecdf(v_data)
nb_data = numel(v_data);
v_sorted_data = sort(v_data);
v_unique_data = unique(v_data);
nb_unique_data = numel(v_unique_data);
v_data_ecdf = zeros(1,nb_unique_data);
for index = 1:nb_unique_data
current_data = v_unique_data(index);
v_data_ecdf(index) = sum(v_sorted_data <= current_data)/nb_data;
end
v_x = [v_unique_data(1) v_unique_data];
v_f = [0 v_data_ecdf];
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by