Very small differences in results introduced by using parfor instead of for loop?

6 次查看(过去 30 天)
I'm calculating some image quality values between many images, and in order to speed this up I've implemented parfor in place of for. There was really no challenge in doing this, because I'm not doing anything very fancy with my variables -- literally I had to do was replace "for" with "parfor". Because the switch to parallel processing was so trivial, I would have thought that the results between the serial and parallel versions of the same code would be identical. However, a very small difference between the results (as in, on the order of 1e-10 or smaller) shows up when I add "parfor" in place of for. To be sure, this is a very small difference, but the fact that it shows up concerns me. Does anyone know if "parfor" somehow introduces very small floating point differences compared to the same code with a for loop?
Let me highlight a simple example of what I mean below, calculating the root mean squared error between two sets of images.
% Assume that we already have made and stored two arrays of images, referenceIms and distortedIms,
% such that these are fixed and known arrays. Each array contains a series of 3D images, and so is
% a 4D array. E.g. referenceIms(:,:,:,1) is a volume image.
% Loop over all images in the series. Note that size(referenceIms,4) = size(distortedIms,4).
% The only change required to make this into parallel code is to swap for with parfor.
RMSE = zeros(1,size(referenceIms,4));
for iImage = 1:size(referenceIms,4)
% Calculate the RMSE
errsSquared = ( referenceIms(:,:,:,iImage) - distortedIms(:,:,:,iImage) ).^2;
RMSE(1,iImage) = sqrt( mean( errsSquared(:) ) );
end
Now let's say I run this code as it is (in its "serial form"), and save the results as the variable RMSEserial. Then I change the code such that "for" is "parfor" and make no other changes, and save those results as the variable RMSEparallel. Note that the input image arrays are fixed, there would be no random number generation or anything like that between these hypothetical runs.
If I then check the equality of these results, using isequal(RMSEserial,RMSEparallel) for example, I see that the results are NOT equivalent. However, they're very nearly equivalent -- if I check something like max( abs(RMSEserial - RMSEparallel) ), I'd see that the maximum difference between the two vectors is something like 1e-10 or smaller. And, just as a sanity check, if I run the serial or parallel versions of the code two times in a row and save the results as above, checking their equivalence, I'd find that the results are identical (that is, parfor is always consistent with parfor results, and for is always consistent with for results, as you'd expect).
So why does this happen?? Does parfor somehow introduce a small difference in the handling of floating point numbers, making these small differences appear between serial and parallel versions of the same code?
Thank you in advance!

采纳的回答

Walter Roberson
Walter Roberson 2017-2-17
It is possible.
With sufficiently large matrices, MATLAB would hand some of the computations off to BLAS or LINPACK or Intel's MKL, which are high performance multi-threaded libraries. Without parfor, those libraries are (unless configured otherwise) going to use as many threads as you have physical cores.
But inside parfor, each worker has only one available core in all released MATLAB versions so far, so the details of how the chunks are broken up and gathered together could be slightly different.
You could also see differences like this with plain for if you had a different number of cores.
For smaller matrices, MATLAB will not hand the computations over to a high-performance library, because there is overhead in doing so and the overhead can more than cancel out the gains for small matrices. So for smaller matrices, you would expect the results to be consistent between for and parfor with the computation you are doing.
Note also: parfor these days supports "reduction variables". For example it would be legal to use
totstd = 0;
parfor iImage = 1:size(referenceIms,4)
totstd = totstd + std( reshape( referenceIms(:,:,:,iImage), [], 1) );
end
parfor detects in this case that totstd is being added into by every thread, and makes arrangements so that the additions do not overwrite each other ("race conditions" can be a big problem in multithread code. Two threads could read the same value, both could update their local copies of it, and then both could write the updated copy; the result would depend upon which one's copy was written last.)
So this will work on parfor -- but the order of the additions is not specified and could vary dynamically depending on which workers happened to run faster. Floating point addition is, however, not commutative -- the order of additions matters for round-off. This is an inherent issue in using reduction variables in parallel.
If you encounter a situation where you are using a reduction variable and the order of reduction matters, then instead of using the reduction expression, store the individual results inside the parfor, and then do the reduction outside of parfor; for example,
totstd = zeros(1, size(referenceIms,4));
parfor iImage = 1:size(referenceIms,4)
totstd(iImage) = std( reshape( referenceIms(:,:,:,iImage), [], 1) );
end
totstd = sum(totstd);
the values would be computed in parallel but the sum would happen in serial, leading to a consistent round-off.

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by