Hi everyone,
I am trying to do the following computations but my matrices are very large (over 6.000.000 lines) and as you can imagine it takes ages and at some point I get an out of memory error. More precisely, for each Director in my BEorg_sum_us table I want to find the number of previous Roles that he had from the boardexindemploymentus table.
uqDiDs = unique( BEorg_sum_us.DirectorID );
BEorg_sum_us.NumRoles = NaN( height( BEorg_sum_us ), 1);
inds = BEorg_sum_us.DirectorID == uqDiDs(i);
tmp = BEorg_sum_us( inds, :);
tmpEmpl = boardexindemploymentus( ismember(boardexindemploymentus.DirectorID, uqDiDs(i) ), : );
numRoles = nan( height(tmp), 1);
roles = tmpEmpl( tmpEmpl.StartYear < tmp.AnnualReportDate(j), 'Topics' );
numRoles(j) = height( unique( roles ) );
BEorg_sum_us.NumRoles(inds) = numRoles;
This approach I estimate that it need about 6 hours.
I have tried to cast everything inside the for loop into a function and then use parfor but I get the out of memory treatment.
uqDiDs = unique( BEorg_sum_us.DirectorID );
BEorg_sum_us.NumRoles = NaN( height( BEorg_sum_us ), 1);
NumRoles = cell( height( uqDiDs ), 1);
NumRoles{i} = functionalRoles(BEorg_sum_us, boardexindemploymentus, uqDiDs(i) );
inds = BEorg_sum_us.DirectorID == uqDiDs(i);
BEorg_sum_us.NumRoles(inds) = NumRoles{i};
As a final approach I have tried to use a tall array for boardexindemploymentus whihc is over 6000000 lines but it take about 4-5 minutes for one iteration. In the above example I run it for the first 100 uqDiDs but I have around 140.000.
Any help to reduce computation time and optimise memory usage is much appreciated! Thank you in advance.