Parfor reports error which does not exist when running as a for-loop
4 次查看(过去 30 天)
显示 更早的评论
Hi,
To speed up some calculations I am using a parfor-loop. I have to run calculations on many files and I made a simple parfor-loop which runs a function on all these files. When analysis of one file is finished, the results are saved on disk. So, in principle, there is no communication between the different workers.
I have 12 workers (local) and for each worker the first run goes without problems. Then however I always get an error message like this (where this happens exactly can vary, but the type of message is always the same):
Error using parallel_function (line 598)
In an assignment A(:) = B, the number of elements in A and B
must be the same.
Error stack:
myfunc.m at 162
func>(parfor body) at 45
Error in func (line 14)
parfor ii=151:303
When I run the code in a for-loop, there is no error-message.
I have tried several things, but did not find a solution. The problem is that I can't debug this error, because it does not happen when I don't use parfor.
The only thing that works is to reduce the amount of workers. When I choose 6 workers, the error doesn't show up.
My temporary solution was to start 2 Matlab sessions, give them each a pool of 6 workers and divide the work manually between the 2 Matlab sessions.
This solution however does not work. In the 2nd Matlab session, the old error appears again after a short while. I really don't understand what the problem is...
10 个评论
Matt J
2013-8-25
编辑:Matt J
2013-8-25
Therefore I strongly believe that the error has something to do with how matlab deals with running parallel computations... It can't have anything to do with this C{ii}.
It's still conceivable that both of the above are true simultaneously, i.e., a difference between parallel and serial modes of computation is causing the C{ii} to be read in corrupted in some cases.
We have to start by examining the C{ii} because we have nowhere else to start, and because ample evidence you provided points to it. The error message you posted says there is a dimension mismatch error. Furthermore, you insisted that this error is occurring in the line
C{ii}(C{ii}>0)=C{ii}(C{ii}>0)+prevmax;
That has to mean that prevmax is for some reason either empty or non-scalar some of the time. We must seek ways to trap that condition.
回答(2 个)
Walter Roberson
2013-8-25
You would get that problem if C{ii-1} was empty, leading to prevmax being empty.
Remember, when you have a parfor loop, the iteration for the any particular value (e.g., #9) might be done at any time relative the iteration for the previous value (#8 in this example), so the assignment to C{8}(C{8}>0) might not have been performed before iteration #9 that calls upon C{8}. Indeed, parfor usually starts from the end. This differs from regular for.
3 个评论
Walter Roberson
2013-8-25
Put in a try/catch that reports the size of prevmax when the problem is triggered
Matt J
2013-8-25
In parallel mode, you'll probably need to do
disp(prevmax)
to report prevmax.
Matt J
2013-8-25
You might also consider using PMODE to troubleshoot. This will allow you to step through different commands and see their results in the parallel command window.
2 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!