standard deviation takes for ever

5 次查看(过去 30 天)
I have a double precision numeric 3D matrix M (converted by fread from uint8) of size 30000 x 500 x 500 I would like to get standard deviation along dimension 2 tic, std(M,0,2) ; toc has taken more than 12 hours and still running meanwhile mean(M,2) only took 80 seconds.
Or a bit more details.. std(M(:,:,1),0,2) takes 0.3 seconds and std(M(:,:,1:100),0,2) takes 34 seconds But std(M(:,:,1:500),0,2) says out of memory
Similarly mean(M(:,:,1),2) takes 0.1 seconds But mean(M(:,:,1:500),2) does not work and gives me 'out of memory' message But mean(M,2) takes about 80 seconds. This is all very confusing! Thanks
  7 个评论
dpb
dpb 2023-9-12
Your original posting says "I have a double precision numeric 3D matrix M of size 30000 x 500 x 500..."
That's what I calculated above at 8 bytes/double takes up 59 GB storage.
I don't follow what " an accumulation of (500 x 100x 5) files each 31 KB in size." means?
Think you're going to have to show us specifically what your array is and how it was constructed.
gujax
gujax 2023-9-12
编辑:gujax 2023-9-13
Ah got it!
I append 100 x 500 x 500 times a 31 KB time series streaming data chunk into one file instead of generating 5 million separate write files.
So that’s about ~8GB data
But when I read it I didn’t quite realize by default fread converts it to double

请先登录,再进行评论。

采纳的回答

gujax
gujax 2023-9-13
calculating statistical std takes more memory than calculating mean. If performing std on double formatted large data sets, it likely will slow down the computer if memory is limited. That may not be true for evaluating statistical mean.

更多回答(1 个)

Steven Lord
Steven Lord 2023-9-12
Can you confirm you're using the std function included in MATLAB? What does this command show?
which -all std
/MATLAB/toolbox/matlab/datafun/std.m /MATLAB/toolbox/matlab/datatypes/tabular/@tabular/std.m % tabular method /MATLAB/toolbox/matlab/datatypes/datetime/@datetime/std.m % datetime method /MATLAB/toolbox/matlab/datatypes/duration/@duration/std.m % duration method /MATLAB/toolbox/matlab/timeseries/@timeseries/std.m % timeseries method /MATLAB/toolbox/matlab/bigdata/@tall/std.m % tall method /MATLAB/toolbox/parallel/parallel/@distributed/std.m % distributed method
  9 个评论
gujax
gujax 2023-9-13
编辑:gujax 2023-9-13
I think I will state this issue resolved? i.e., calculating statistical std takes more memory than calculating mean. If performing std on double formatted large data sets, it likely will slow down the computer if memory is limited. That may not be true for evaluating statistical mean.
dpb
dpb 2023-9-13
The issue you're having must be in disk swapping owing to limited real memory...I'm still not positive about just how big your array is. How about
whos M
? to tell us precisely what you've processing and
memory
for the available memory your machine has?
It depends on how TMW builds the executable and what processor instructions they assume; unfortunately, it's likely they code to a "lower common denominator" of what is out there because know that not all customers are going to have latest CPU technology with enhanced vector processing instructions making use of builtin vector pipeline that exists with current processors.
I've never messed with trying it out, if you have a high-memory graphics card, you could possible try the GPU stuff...

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

产品


版本

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by