Parfor solving optimization problems (Cplex) slower than for
显示 更早的评论
Hello,
I am trying to solve a bunch of optimization problems in parallel using Matlab Parallel Toolbox 2018b on my client (Win10) + Matlab Distributed Server 2018b on my 3 node-cluster (Win7) with 52 workers. These are rather small problems, but there's hundreds of them so, theoretically, parfor should be helpful in this case.
I am reading these problems from .lp files into cell array and then I am solving them within parfor loop, as below:
% subp_array is 1xn cell array % with Cplex problems
nThreads = 1; % I don't see any time benefit of giving it more than 1 thread
parpool('MJSProfile1',nWorkers);
totalTime = tic;
parfor subp_index = 1:length(subp_array)
iterTime = tic;
prob = subp_array{subp_index}; % assigning subp_array{subp_index} to prob and working on it apparently speeds up calculations
prob.Param.parallel.Cur = -1; % set parallel option to opportunistic
prob.Param.threads.Cur = nThreads; % set number of threads per problem
prob.Param.mip.tolerances.mipgap.Cur = 0.01;
prob.solve();
% get time of particular iteration
elapsedTime{subp_index} = toc(iterTime);
end
% get time of entire loop
elapsedTotalTime = toc(totalTime);
The problem is that this parfor loop with 10 problems on 16 workers runs for 32 sec comparing go 1.5 sec (sic!) of regular for loop. When examinating time results, it comes out that elapsed time of particular iterations are very short, but overall loop time is still large...
These are values of elapsedTime array:
{[0.0275]} {[0.0317]} {[0.0274]} {[0.0314]} {[0.0695]} {[0.4816]} {[0.0808]} {[0.0343]} {[0.0399]} {[0.0845]}
which is in total less than 1 second!
Is there anything in the syntax that may cause time delays? I am using sliced variables, assigning prob firstly not to call the variable multiple times, no idea what else can be done... Apparently, if I run parfor with M = 0 (sequential), it gets the result immediately (in particular the difference is visible for few hundreds of problems). What may cause my parallel computing so slow?
Thanks in advance
Kasia
回答(2 个)
Edric Ellis
2020-3-17
Does the performance improve much / not much / not at all if you run the parfor loop a second time without closing the pool?
If the performance does improve a lot, then it's likely that the slow-down was caused by the parfor infrastructure having to work out that the code wasn't available, and attaching it to the pool. A message is printed when this occurs, or you can check the result of calling listAutoAttachedFiles:
listAutoAttachedFiles(gcp())
You can either live with that first-time slow-down, or attach the files up-front using addAttachedFiles
If the performance remains the same, perhaps the problem is the amount of data being transferred. Use ticBytes and tocBytes to investigate this. You could also experiment with stubbing-out most of the loop body. I.e. if you run a loop like this:
parfor subp_index = 1:length(subp_array)
prob = subp_array{subp_index}; % assigning subp_array{subp_index} to prob and working on it apparently speeds up calculations
end
how does that perform? That loop incurs the same amount of data transfer.
11 个评论
Katarzyna Furmanska
2020-3-18
Katarzyna Furmanska
2020-3-18
Edric Ellis
2020-3-18
Hm, that's strange. It's not really a huge amount of data to transfer, even considering your workers are remote. When parfor sends data to workers, it has to do something equivalent to save and then load. Just to rule out any strangeness there, you could try profiling something like this, which should end up sending basically the same amount of data, but using trivial data types:
numRows = 100;
colsIn = 5e4;
colsOut = 3e2;
sliceIn = zeros(numRows, colsIn, 'uint8');
sliceOut = zeros(numRows, colsOut, 'uint8');
tb = ticBytes(gcp()); t = tic();
parfor idx = 1:numRows
sliceOut(idx,:) = sum(sliceIn(idx,:));
end
tocBytes(gcp(), tb); toc(t);
Edric Ellis
2020-3-18
The other thing to try is using the MATLAB profiler on you code running on the client, just to make sure there's no other unexpected time being taken on the client. (Note that you'll definitely see some chunks of client time spent in the parfor implementation where basically the client is just sitting there waiting for results from the workers - that's completely normal).
Katarzyna Furmanska
2020-3-18
Katarzyna Furmanska
2020-3-24
Edric Ellis
2020-3-24
That profiling result is useful because it confirms that all the time on the client is spent waiting for the workers to return their results. (The q.poll statement is where the client gets blocked waiting for the worker results to show up).
The next thing to try is using "mpiprofile" to see where the time is being spent on the workers. Unfortunately, "mpiprofile" is designed more for spmd workflows than parfor, but it does work. Here's how you'd use it:
spmd, mpiprofile('on'); end
... % Call your script containing parfor
spmd, mpiprofile('viewer'); end
This will show you a profile view for all the workers. You can look through that to see if there are any unexpected hot-spots. This could be compared to the view you get when running the code under the profiler using for instead of parfor.
Katarzyna Furmanska
2020-3-24
Katarzyna Furmanska
2020-3-25
Edric Ellis
2020-3-25
There's definitely a chunk of time in distcompdeserialize - that's an internal PCT function that is used when transferring data from the client process to a MATLAB worker process.
However, looking at the absolute times - there's still a big chunk of time going somewhere. The total time taken by remoteParallelFunction (which is the worker-side wrapper for the body of a parfor loop) is only ~0.6 seconds, but (if I've understood correctly) the overall loop takes much longer. I don't really have any good way to explain that.
I would go back to trying to run a version of the parfor loop with the data transfer in place, but the actual computations stubbed out. My suspicion is that that will still take basically the same amount of time. This points to data transfer being the bottleneck - despite the actual number of bytes being transferred being relatively not that large...
Katarzyna Furmanska
2020-3-25
Egor Buiko
2021-4-14
编辑:Egor Buiko
2021-4-14
0 个投票
No need go to far for optimization problem solving. Even for :
X=zeros(1,10^6);
Parfor i=1:10^6, X(i) = i; End
Which by documentation forged for parallel pool, works two times slower then regular for.
类别
在 帮助中心 和 File Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




