Confusion about performance improvement by memory preallocation

Question

Felix Schönig 2020-12-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/689974-confusion-about-performance-improvement-by-memory-preallocation

编辑： Bruno Luong 2020-12-11

I have a complex task of extracting relevant measurement data from a stack of mat files (each has 237 variables). I have 30 mat files, the files 9 to 18 are relevant (can not delete them, let's just accept this here). In each relevant file, only a few data points shall be extracted (for each variable), their index is given by a separate file (containing the measurement times).

The meas_times.mat looks like this, where each column corresponds to a file (9:18) and the rows contain start and stop times in alternating order.

0	    0	0	0	0	    0	0	315	    0
0	    0	0	0	0	    0	0	352	    0
0	    0	0	56	0	    26	0	373	    0
0	    0	0	115	0	    45	0	394	    0
0	    0	0	141	121	    65	515	476	    1
0	    0	0	200	180	    104	574	511	    8
0	    0	0	471	201	    132	609	837	    44
0	    0	0	530	260	    191	611	860	    95
0	    0  	0	610	721	    443	664	881	    109
0	    0	0	669	780	    502	720	910	    154
461	    0	0	671	796	    521	737	961	    174
520	    0	0	730	821	    580	796	990	    187
591	    0	171	1216 1072   711 1126 1071	580
650	    0	230	1275 1105   770 1185 1130	610
981	    21	276	1311 1197   773 1241 1171	667
1040    80	335	1336 1256   832 1300 1200	695
1061    121 721 1629 1376   1390 1324 1311	783
1120    142 780 1662 1435   1449 1383 1340	842
1721    151 816 1677 1451   1482 1411 1721	1151
1780    188 875 1736 1510   1541 1470 1780	1210

Example:

Of file 9 (1st column), I only want data from index 508-567 and 601-660 in each variable. For file 10, 21-80 etc. ...

My first solution with consecutive for-loops was:

tic
%% initialize
clear 
clc
load('meas_times.mat')
times = B;
%% define path
path = 'C:\Users\felix\Documents\HeatFlow\Messkampagne Itterbeck\Messdaten\'; 
list = dir(path); 
%% create empty struct t to collect relevant data in
t = struct(load(sprintf('%s%s',path,list(9).name)));
fn = fieldnames(t);
for a = 1:numel(fn)
    t.(fn{a})=[];
end
%% filter relevant data
for b = 9:18
% extract files from 8th to 18th, b is their position in the folder    
Dateiname = sprintf('%s%s',path,list(b).name);
s = struct(load(Dateiname)); % this is the b-th file as struct
fn = fieldnames(s);
    if fn{1} == "Abtastrate_1_HzZeit_1_Hz_" % removes a duplex variable that some files have
        s = rmfield(s,'Abtastrate_1_HzZeit_1_Hz_');
        fn = fn(2:end);
    end
v = nonzeros(times(:,b-8)); % % extract the measurement times that correspond to the b-th file
    for z = 1:115 % 1-Hz variables from row 1 to row 115
        temp = []; % empty dummy array
        for i = 1:2:length(v)
            temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
        end
        t.(fn{z}) = vertcat(t.(fn{z}),temp); % write extracted values into the new struct to the z-th variable
    end
    
    for z = 116:154 % 2-Hz variables up to row 154
        
        % preprocessing: mean of 2Hz
        for i = 1:2:numel(s.(fn{z}))-1
            s.(fn{z})(i:i+1) = mean(s.(fn{z})(i:i+1));
        end
        s.(fn{z})(2:2:end) = [];
      
        temp = []; % empty dummy array
        for i = 1:2:length(v)
            temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
        end
        t.(fn{z}) = vertcat(t.(fn{z}),temp); % write extracted values into the new struct to the z-th variable
        
    end
end
toc

However, this ran very slowly, because the last variables in each file contain a few million values (damn 1kHz loggers). As Matlab gave me the hint, that the temp array was changing size with every loop iteration and this would hurt performance. So I completely rewrote the whole script, included functions and preallocated memory. However, now the whole operation takes thrice as much time. Here is the "improved" script. Have a made any obvious mistake?

tic
%% initialize
clear 
clc
load('meas_times.mat')
times = B;
%% define path
path = 'C:\Users\felix\Documents\HeatFlow\Messkampagne Itterbeck\Messdaten\'; 
list = dir(path); 
%% create empty struct t to collect relevant data in
t = struct(load(sprintf('%s%s',path,list(9).name)));
fn = fieldnames(t);
for a = 1:numel(fn)
    t.(fn{a})=[];
end
%% filter relevant data
for b = 9:18
% extract files from 8th to 18th, b is their position in the folder
filename = sprintf('%s%s',path,list(b).name);
s = struct(load(filename)); % this is the b-th file as struct
fn = fieldnames(s);
    if fn{1} == "Abtastrate_1_HzZeit_1_Hz_" % removes a duplex variable that some files have
        s = rmfield(s,'Abtastrate_1_HzZeit_1_Hz_');
        fn = fn(2:end);
    end
v = nonzeros(times(:,b-8)); % extract the measurement times that correspond to the b-th file
w = [[0,0]';diff(v)+1]; % important for indexing later on
    for z = 1:154 % 154 variables in the struct s
        switch z
            case num2cell(1:115)
                hz = 1;
                t.(fn{z}) = vertcat(t.(fn{z}),selectdata(hz,z,v,w,s,fn));
            case num2cell(116:154)
                hz = 2;
                t.(fn{z}) = vertcat(t.(fn{z}),selectdata(hz,z,v,w,s,fn));
        end
    end
end
toc
function temp = selectdata(hz,z,v,w,s,fn)
    % preprocessing
    for i = 1:numel(s.(fn{z}))/hz
        s.(fn{z})(i:i+(hz-1)) = mean(s.(fn{z})(i:i+(hz-1)));
        s.(fn{z})(i+1:i+(hz-1)) = [];
    end
    % empty dummy array
    temp = zeros(sum(w(1:2:end)),1);
    
    % extract data into temp
    for i = 1:2:length(v)
        temp(sum(w(i:-2:1))+1:sum(w(i+2:-2:1)),1) = s.(fn{z})(v(i):v(i+1));
    end
    
end

Feel free to ask questions. I do not want help on my project, I want to understand why the code got SLOWER when I followed Matlabs instructions and changed the code in a way that it preallocated memory.

The 1st code took 9.7 seconds, the 2nd code approximately 31 seconds.

9 个评论
显示 7更早的评论隐藏 7更早的评论

Felix Schönig 2020-12-11

编辑：Felix Schönig 2020-12-11

在 MATLAB Online 中打开

Your input and Jan's are what I needed. I have understood your use of arrayfun and the anonymous function handle (didn't know that before), it's beautiful! :-) I also reworked the outer loop to use the fields feature of structs. Is this what you had in mind?

path = 'somepath'; 
list = dir(path); 
for b = 9:18 % bad hard code indexing, I know :>
    filename = sprintf('%s%s',path,list(b).name);
    tempS = load(filename);
    fn = fieldnames(tempS); % 
    if fn{1} == "Abtastrate_1_HzZeit_1_Hz_" % removes a duplex variable that some files have
        tempS = rmfield(tempS,'Abtastrate_1_HzZeit_1_Hz_');
        fn = fn(2:end);
    end
    u(b-8) = tempS;
end

This gives me a 1x10 struct which I can now work on using indexing as you recommended. For your information:

This field concatenation alone still takes 4.97 seconds; it's really a big bunch of data. I will try to improve speed with the indexing method in a function and report later.

E: I would mark your help as "Accept this answer", but it is only as a comment. Because of this I will accept Jan's answer, yet, a big thank you to you! You really helped me.

Bruno Luong 2020-12-11

编辑：Bruno Luong 2020-12-11

在 MATLAB Online 中打开

Yes that's what I have in mind. Might be something about allocation of u you could improve, but it's not much a big deal and it's a detail you can work on later.

Now what you could do outside this loop on u to build a single group should go something like this (you need to look for comma list MATLAB syntax to understand the code):

% test data, replace with your u array of structure
u(1) = struct('a', [0;1], 'b', [2;3])
u(2) = struct('a', [4],   'b', [5;6;7])
f = fieldnames(u);
data = cellfun(@(f) vertcat(u.(f)), f, 'unif', 0);
sarg = [f,data].';
sall = struct(sarg{:})

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jan 2020-12-11

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/689974-confusion-about-performance-improvement-by-memory-preallocation#answer_572135

在 MATLAB Online 中打开

Try to replace this part of the first solution:

    for z = 1:115 % 1-Hz variables from row 1 to row 115
        temp = []; % empty dummy array
        for i = 1:2:length(v)
            temp = vertcat(temp,s.(fn{z})(v(i):v(i+1)));
        end
        t.(fn{z}) = vertcat(t.(fn{z}),temp); % write extracted values into the new struct to the z-th variable
    end    

by:

    for z = 1:115 % 1-Hz variables from row 1 to row 115
        lenv = numel(v);
        tmpC = cell(lenv / 2);  % empty dummy array
        sfnz = s.(fn{z});       % Cheap shared data copy instead of repeated indexing
        for k = 1:lenv / 2
            idx     = 2 * k - 1;
            tmpC{k} = sfnz(v(idx):v(idx + 1));
        end
        t.(fn{z}) = vertcat(t.(fn{z}), tmpC{:}); % write extracted values into the new struct to the z-th variable
    end    

The iterative growing or shrinking of arrays is extremely expensive. So avoid things like this:

s.(fn{z})(i+1:i+(hz-1)) = [];

A tiny exmple:

    x = [];
    for k = 1:1e6
        x(k) = k;
    end    

This does not request 8MB (8 byte per double), but sum(1:1e6)*8MB, because in each iteration a new array is created the old contents is copied. This means more than 4 TB of RAM! Of course this is slow. With a pre-allocation and without a growing array, Matlab requests the expected 8MB only: x = zeros(1, 1e6). The same effect applies for an iterative shrinking.

In Matlab 2018b vertcat has some potential for improvements. Then it might be idea to test the speed with https://www.mathworks.com/matlabcentral/fileexchange/28916-cell2vec .

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Felix Schönig 2020-12-11

Hello Jan,

great example, I begin to understand! I just tested and compared your code in a copy, now I am however confused again: It is by approximately half a second slower than my "bad first solution". (5.3s vs 5.7.. s) Any idea why?

However, you made a clear point about the growing/shrinking of arrays. I will rework my code to avoid this. I have a lot to digest right now, still busy with Bruno's hint.

Thank you!

请先登录，再进行评论。

Confusion about performance improvement by memory preallocation

9 个评论
显示 7更早的评论隐藏 7更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Confusion about performance improvement by memory preallocation

9 个评论 显示 7更早的评论隐藏 7更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

9 个评论
显示 7更早的评论隐藏 7更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论