Do the boxplot stats without boxplot
显示 更早的评论
Dear all, I have an enormous number of numbers and I want to plot the box plot
Actually I have 7 sets of 125.000.000 numbers (luckily I am running this in a system with huge ram). As boxplot with 7 inputs takes so much time I was thinking if there might be a way to calculate for every set all the statistics boxplot calculates.. and then feed them in a boxplot_type function to do the stupid plotting.
Is there something like that in matlab?
I would like to thank youf or your help
Best Regards Alex
回答(4 个)
Oleg Komarov
2011-8-13
Time boxplot:
tic
boxplot(...)
toc
against:
tic
prctile(...)
max(...)
min(...)
toc
and also against:
tic
sort()
toc
for one of your datasets, if there's significant gain for one of the alternatives then yes you could improve boxplotting...but I doubt it.
EDIT
Avoiding to call boxplot on the big database I create a fake boxplot and adjust it with the stats calculated from the real database:
% Suppose truedata is your dataset A
truedata = rand(1e6 + 123423,7);
sz = size(truedata);
% Create fakedata and boxplot it
fakedata = rand(10,sz(2));
h = boxplot(fakedata,'labels',10:10:10*sz(2));
% Now sort your truedata and calculate min,max,25,50,75 percentile
truedata = sort(truedata);
s.mins = truedata(1,:);
s.maxs = truedata(end,:);
xi = bsxfun(@plus,sz(1).*[0.25; 0.5; 0.75], sz(1) * (0:sz(2)-1));
x = [floor(xi(:)); ceil(xi(:))];
s.ptiles = reshape(interp1(x,truedata(x),xi(:)),3,sz(2));
% Readapt the fake boxplot:
% 1.Adjust upper whisker
set(h(1,:),{'Ydata'}, num2cell([s.ptiles(3,:); s.maxs].',2));
set(h(3,:),{'Ydata'}, num2cell(repmat(s.maxs.',1,2),2));
% 2. Adjust lower whisker
set(h(2,:),{'Ydata'}, num2cell([s.mins; s.ptiles(1,:);].',2));
set(h(4,:),{'Ydata'}, num2cell(repmat(s.mins.',1,2),2));
% 3. Adjust body and median
set(h(5,:),{'Ydata'}, num2cell(s.ptiles([1 3 3 1 1],:).',2));
set(h(6,:),{'Ydata'}, num2cell(repmat(s.ptiles(2,:).',1,2),2));
% 4. Delete outlier marking
delete(h(end,:))
7 个评论
Alex
2011-8-13
Alex
2011-8-15
Oleg Komarov
2011-8-15
I don't observe the same behaviour. What code did you call?
Alex
2011-8-16
Oleg Komarov
2011-8-16
It's possible that the internal sort of boxplot "recognizes" that the array is already sorted and do not call othe internal routines or it may be that the subfunction don't end modifying A and copy is created keeping memory consumpution at a "reasonable" level.
Anyways, the fastest method should be the one I indicated in my edit. Consider that you can create the fake boxplot with all the features you want and adjust it later keeping all of those. Right now I am eliminating the red outliers but it's very simple to keep them and update.
Alex
2011-8-16
Oleg Komarov
2011-8-16
Use the profiler to check the differences.
Have you tried my other solution?
Alex
2011-8-13
0 个投票
1 个评论
Oleg Komarov
2011-8-13
You said 1*125.000.000 sets, what do you mean?
Sorting 1e8 elements takes 11 seconds on my laptop, after that most of the calculations can be done in instants. I really don't see how boxplot is taking som much time.
Please post the code you're using.
Alex
2011-8-14
3 个评论
Oleg Komarov
2011-8-14
Please don't create additional answers but use the comment facility (I know it doesn't allow code formatting but it keeps the logical track of question-answer).
Have you tried to
A = sort(A);
If you can sort it in a reasonable amount of time then we could find a workaround. You can also try to sort column by column (to break the memory requirements:
for n = 1:size(A,2)
A(:,n) = sort(A(:,n));
end
It takes so much time because boxplot uses uniquerows and unique. What happens is that it fills up all of your memory (300 GB who do you work for?) and starts swapping on hard disk.
Alex
2011-8-15
Oleg Komarov
2011-8-15
Give a try to my EDIT in my answer.
Alex
2011-8-14
0 个投票
1 个评论
Oleg Komarov
2011-8-14
Please see the edit on my answer and stop creating additional answers, use comments.
类别
在 帮助中心 和 File Exchange 中查找有关 Box Plots 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!