- Partition A so that each worker stores only a piece
- Perform longCalculation in batches
- Reduce the result using for-drange and then gplus.
Parallelizing computation with memory restrictions
9 次查看(过去 30 天)
显示 更早的评论
There's a program that I would like to run in parallel, as I have about a dozen cores available to me. However, I only have 128GB of RAM, which puts some constraints on how I want to parallelize the code.
A is a list of 50 matrices. Each matrix (and all matrices involved) take up about 1GB of memory, which is where the memory constraint comes in. Schematically, I want to execute the code
for i=1:1000
B = longCalculation(i) % This is the step that takes a lot of time
for j=1:50
shorterCalculation(A{j}, B)
end
end
Since longCalculation takes the longest to run, I would like to parallelize that - i.e., convert the first for loop into a parfor loop. However, each worker needs access to all of A, and I can't just make a copy for each worker due to memory constraints. Paralellizing the second for loop, and only giving each worker access to a small part of A, won't speed up the code that much. Any suggestions on changing/modifying this code so that it can be run in parallel? Thanks!
0 个评论
回答(1 个)
Edric Ellis
2019-6-20
Ok, this is somewhat dependent on what it is that you need to do with the results, but here's one way that you can avoid replicating A on each worker, by using a combination of spmd and for-drange. The basic idea is:
%% Step 1: build A, but ensure each worker only gets a portion.
% Use for-drange to achieve that. This presumes that you can build
% pieces of 'A' directly on the workers.
nA = 50;
nLoop = 1000;
spmd
A = cell(1, nA);
for idx = drange(1,nA)
A{idx} = ones(1000) * idx;
end
end
% At this point, each worker has an independent 'A' where only some of the
% cells are filled in.
%% Step 2: perform the calculations in parallel.
spmd
% Allocate the full output cell array.
output = cell(1, nLoop);
% Loop over the full range, stepped by 'numlabs'
for idx = 1:numlabs:(nLoop+numlabs)
% Each worker performs one longCalculation
myIdx = idx + (labindex - 1);
myB = longCalculation(myIdx);
% Next, we need to work with each 'myB', and perform
% shorterComputation. So, loop over 'numlabs', and use
% labBroadcast to give each worker the value of B.
for bIdx = 1:numlabs
% Make sure we don't exceed the loop range
outIdx = (idx + bIdx - 1);
if outIdx > nLoop
break;
end
% Get the value of B to each worker.
B = labBroadcast(bIdx, myB);
% Reduce the result on each worker using shorterCalculation
partialResult = 0;
for aIdx = drange(1, nA)
partialResult = partialResult + shorterCalculation(A{aIdx}, B);
end
% Combine the overall result into 'output'.
output{outIdx} = gplus(partialResult, 1);
end
end
end
x = output{1};
x = [x{:}]
%% Dummy "longCalculation".
function x = longCalculation(x)
pause(0.1);
x = -x;
end
%% Dummy "shorterCalculation".
function x = shorterCalculation(Ai, B)
x = Ai(1) * B;
end
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Distributed Arrays 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!