Is it possible for parfor workers to keep data in between iterations?

Question

jake555 2019-8-1

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/474490-is-it-possible-for-parfor-workers-to-keep-data-in-between-iterations

评论： jake555 2019-9-19

Hi,

I'm new to parallel processing and hoping I can get some suggestions from the larger Matlab community. I have a set of N column vectors of size (4x1) that can be written as X = (4xN) matrix. At each time step, I need to run update calculations on each of the Nx(4x1) vectors. The updated values then become part of the input at the next time step. I have already vectorized everything so there are no for loops in calculating the update.

I'm trying to speed up the process more by using parfor. I have seen improvement from 1 to 2 cores but 3 and 4 cores are both comparable to 2 cores. I'd like to see if I can continue improving with additional cores (particularly if I were to run this on a larger cluster). I have read enough elsewhere to understand this may not be possible, but I'd like to try.

I'm currently sending the workers everything at each time step, but it seems like I should be able to keep the updated information on the workers so they can use it at the next time step. To hopefully make this a little clearer, I am doing the following in pseudo-code:

X = InitialCondition();
for k=2:numtimesteps
    
    % turns X into a cell array, where each cell can go to a worker
    XC = SliceFunction(X,numworkers); 
    
    XCnew = cell(1,numworkers);
    parfor i=1:numworkers
        XCnew{i} = UpdateFunction(@CalculateAB,XC{i},otherinputs); % otherinputs is much smaller than X
    end
    
    % final X, which becomes the input at the next timestep
    X = [XCnew{:}];
end
function XCnew = UpdateFunction(CalculateAB,XC,otherinputs)
% this function calculates A,B, then solves x=A\B
% note A,B are each 3D arrays and I need to solve A*x=B for each 2D slice
[A,B] = CalculateAB(XC,otherinputs);
% this is a modification of the File Exchage multinv
% it turns the 3D A,B matrices into sparse 2D matrices and solves using the \ operator
XCnew = multimldivide(A,B); 
end

So to summarize, I guess the question is this: can I keep information on the workers so that I don't have to send as much info back and forth? I'm hoping this could reduce the overhead involved with using parfor, so that I can continue to see speed improvements as I increase the number of cores. Or are there other tricks to reduce the overhead? I'm constantly going in and out of the parfor with each iteration of k.

Thanks!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Gaurav Garg 2019-8-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/474490-is-it-possible-for-parfor-workers-to-keep-data-in-between-iterations#answer_389586

Hi,

To save yourself from sending data back and forth for the worker threads, you can make a temporary array of cells which store the data for each worker thread and other threads can access/read/update the cells for their own as well as other workers.

It can reduce the overhead involved, although it’s not necessary. This is because speed improvements depend on many factors such as number of cores being the first one, but also context switching being a detrimental factor.

Each worker thread needs to save its state/data to the memory so that it can resume its execution from the point where it left when it comes back and load its state from the memory when it returns. This is known as context switching. Sometimes, this time of saving and loading data from/to memory is excessive when optimal conditions are not met. I suspect this might be the case you are encountering.

You can use the persistent variables if they might suit your case and needs.