Speed up algorithm and memory issues

7 次查看(过去 30 天)
Hi everyone,
I am trying to do the following computations but my matrices are very large (over 6.000.000 lines) and as you can imagine it takes ages and at some point I get an out of memory error. More precisely, for each Director in my BEorg_sum_us table I want to find the number of previous Roles that he had from the boardexindemploymentus table.
uqDiDs = unique( BEorg_sum_us.DirectorID );
BEorg_sum_us.NumRoles = NaN( height( BEorg_sum_us ), 1);
tic
for i = 1:100 %numel(uqDiDs)
inds = BEorg_sum_us.DirectorID == uqDiDs(i);
tmp = BEorg_sum_us( inds, :);
tmpEmpl = boardexindemploymentus( ismember(boardexindemploymentus.DirectorID, uqDiDs(i) ), : );
numRoles = nan( height(tmp), 1);
if ~isempty(tmpEmpl)
for j = 1:height( tmp )
roles = tmpEmpl( tmpEmpl.StartYear < tmp.AnnualReportDate(j), 'Topics' );
numRoles(j) = height( unique( roles ) );
end
BEorg_sum_us.NumRoles(inds) = numRoles;
end
end
toc
This approach I estimate that it need about 6 hours.
I have tried to cast everything inside the for loop into a function and then use parfor but I get the out of memory treatment.
uqDiDs = unique( BEorg_sum_us.DirectorID );
BEorg_sum_us.NumRoles = NaN( height( BEorg_sum_us ), 1);
NumRoles = cell( height( uqDiDs ), 1);
tic
for i = 1:100 %numel(uqDiDs)
NumRoles{i} = functionalRoles(BEorg_sum_us, boardexindemploymentus, uqDiDs(i) );
end
for i = 1:100
inds = BEorg_sum_us.DirectorID == uqDiDs(i);
BEorg_sum_us.NumRoles(inds) = NumRoles{i};
end
toc
As a final approach I have tried to use a tall array for boardexindemploymentus whihc is over 6000000 lines but it take about 4-5 minutes for one iteration. In the above example I run it for the first 100 uqDiDs but I have around 140.000.
Any help to reduce computation time and optimise memory usage is much appreciated! Thank you in advance.

回答(1 个)

Swastik Sarkar
Swastik Sarkar 2024-10-16
Hi @Elric,
The implementation of the code using a parfor loop has resulted in out-of-memory issues. This may occur if the number of workers assigned (i.e., the number of logical cores on the machine) is excessive for the task at hand. It is advisable to limit the number of workers to 2 initially and gradually increase it until an out-of-memory error is encountered. More information can be found in the documentation:
Additionally, consider using a pool of threads for parpool, as it is more lightweight compared to a pool of processes. For guidance on choosing the appropriate parallel programming paradigm, refer to the following documentation:
Hope this helps.

类别

Help CenterFile Exchange 中查找有关 Performance and Memory 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by