How to share a HashMap in parallel computing

1 次查看(过去 30 天)

Setup

I am trying to parallelize an algorithm that runs the same code on each row of a matrix (and then postprocesses the results.)

There are some computations that occur in the processing of multiple rows (this reoccurence is hard to predict).

Therefore, currently I call an object that performs these computations and saves the results in a HashMap, so when processing row $n$ needs computations that were already done for row $m$ they don't need to be done again.

It does not affect the outcome of the algorithm in which order the rows are processed.

Problem

I am not able to use the HashMap in parallel code, each worker ends up with its own HashMap.

PS

I understand the philosophy behind this behavior. Yet in my example, order does not matter and I would like to circumvent the standard behavior.

Minimal working example:

classdef MyPar <handle
    properties
        map;
    end
    methods 
        function obj=MyPar()
            obj.map=containers.Map('KeyType','double','ValueType','any');
        end
        function y=compute(obj,n)
           if ~obj.map.isKey(n)
               obj.map(n)=sin(n);
               fprintf('Did not find key ''%d''\n',n)
           else
               fprintf('Found key ''%d''\n',n)
           end
           y=obj.map(n);
        end
    end
    methods(Static)
        function R=test()
            c=MyPar();
            Nworkers=3;
            A=ones(Nworkers,2);
            spmd(Nworkers)
               R=c.compute(A(labindex,1))+c.compute(A(labindex,2));
            end    
        end
    end
end

Running MyPar.test() gives

>> MyPar.test();
Lab 1: 
  Did not find key '1'
  Found key '1'
Lab 2: 
  Did not find key '1'
  Found key '1'
Lab 3: 
  Did not find key '1'
  Found key '1'

In this trivial example, I would wish to have a code where two of the workers don't need to do their own computations at all (because the only computation ever done is compute(1))

回答(1 个)

Edric Ellis
Edric Ellis 2016-5-3
There is no way to have the map data structures automatically propagate changes, but you could use the communication functions within spmd to explicitly synchronize the known keys and values.
Whether this is actually a practical option depends a lot on the structure of your computations - you need a spot in the spmd block where all the workers agree that it's time to synchronize. If you can do that, then you could use gop to get the job done, perhaps a bit like this:
spmd
map = containers.Map();
for iteration = 1:1000
% Choose key, look up or compute value:
key = num2str(randi(100));
if ~isKey(map, key)
value = sprintf('Value: %s computed on lab: %d', key, labindex); % dummy computation
map(key) = value;
else
value = map(key);
end
%
% synchronize 'map'.
% Step 1: get all the keys:
allKeys = unique(gcat(keys(map)));
%
% Step 2: get values on each worker
allValues = cell(1, numel(allKeys));
gotValue = false(1, numel(allKeys));
for idx = 1:numel(allKeys)
if isKey(map, allKeys{idx})
gotValue(idx) = true;
allValues{idx} = map(allKeys{idx});
end
end
%
% Step 3: combine all known values
globalValues = gcat(allValues, 1);
gotGlobalValue = gcat(gotValue, 1);
%
% Step 4: put values into map
for idx = 1:numel(allKeys)
row = find(gotGlobalValue(:, idx), 1, 'first');
value = globalValues{row, idx};
map(allKeys{idx}) = value;
end
end
end

类别

Help CenterFile Exchange 中查找有关 Distributed Arrays 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by