Parallel pool on function that uses persistent variables

Question

0 个投票

Obviously parallel computing cannot handle correctly persistent variable, as showed in this minima example.

It seems when runing in parallel, the persistent variable remains unset [] even if I have set it before.

delete(gcp('nocreate')); % delete the current pool if any
ppobj = parpool('local'); %create parallel pool, 'threads' show the same behaviour
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 2).
BigData = ones(1000);
Store(1,BigData);
FooOutNormalRun = Foo(1) % 1000000, OK
FooOutNormalRun = 1000000
% Run 
%   FooOutRunWithppj = Foo(1)
% under parallel pool
Future = parfeval(ppobj, @Foo, 1, 1);
[j, FooOutRunWithppj] = fetchNext(Future);
FooOutRunWithppj % returns 0, expected 1000000
FooOutRunWithppj = 0
FooOutNormalRun = Foo(1) % 1000000, OK
FooOutNormalRun = 1000000
delete(gcp('nocreate'))
%%
function Data = Store(action, Data)
persistent PDATA
if action == 1
    PDATA = Data; % Store
elseif action == 2
    Data = PDATA; % Retrieve
end
end
%%
function s = Foo(count) %#ok
BigData = Store(2); % Retrieve
s = sum(BigData,'all');
end

Is the limitation mentioned somewhere in the documentation?

And more importantly any workaround (I try to reduce data broadcast in parallel computing, since my BigData is readonly and I don't want it to be copies (broadcasted) to the process, the overhread slows down and requires memory, and in principe I would be able to avoid that.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Bruno Luong 2022-6-6

编辑：Bruno Luong 2022-6-6

在 MATLAB Online 中打开

I also try to store data inside a handle class like this

classdef StorageManagement < handle
    % Usages
    % instantiate:
    % s = StorageManagement(Data)
    % ...
    % Data = s.GetData()
    properties (SetAccess = immutable, NonCopyable = true)
        myData;
    end
    
    methods
        % Constructor
        function obj = StorageManagement(Data)
            if nargin >= 1
                obj.myData = Data;
            else
                warning('StorageManagement instantiates with empty data');
                obj.myData = [];
            end
        end
        function Data = GetData(obj, varargin)
            Data = obj.myData;
        end
    end
end

Then retrieve the data during parallel call with

BigData = myhandleobj.GetData(); % myhandleobj is instance of StorageManagement

That works but the data seems to be broacasted to process. So I don't save anything if I pass BigData directly.

If I implement storage using persistent inside the class, it fails just like persistent inside a standard function as in my original question.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Walter Roberson 2022-6-6

0 个投票

I am having difficulty finding this documented, but it is well known. parpool local workers operate in different processes, and parfor and parfeval only copy over variables that it can clearly determine need to be copied. A persistent variable hidden inside the parsed version of a function is not "obvious" for this purpose. parfor and kin essentially save() and load() variables (equivalent to doing so) and do not bother to save and load (non-anonymous) functions.

The usual workaround is to parpool.constant and parfevalOnAll a call to get the value saved in a local persistent variable.

As is the case for global, the functionality is still available, but each worker will start with the variables clear.

It is possible that for your purposes, that using parpool threads might work for you, as the threads can share memory, so broadcast variables become much less expensive.

https://www.mathworks.com/help/parallel-computing/choose-between-thread-based-and-process-based-environments.html

3 个评论
显示 1更早的评论隐藏 1更早的评论

Walter Roberson 2022-6-6

parpool.constant is supposedly an efficient way to get data into all of the workers, more efficient than broadcast variables.

Bruno Luong 2022-6-6

@Walter Roberson thanks for the tip of parallel.pool.Constant, it works beautifully to save the bandwidth.

请先登录，再进行评论。

Answer 2

Edric Ellis 2022-6-6

0 个投票

As Walter points out, workers (either threads or processes) do not share persistent variable workspaces. I too cannot find this explicitly mentioned in our doc. There's a hint here, but the restriction is more general than just parfor.

Whether you use threads or processes, you still need to arrange for each worker to get access somehow to your BigData. Either the contents need to be copied to the workers, or each worker needs to load/create it for itself. Using parallel.pool.Constant can work with either option. Again, Walter points out that "copying" data to thread-based workers is much more efficient than for process-based workers - although it sounds like current limitations mean that they don't work for you in any case.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Steven Lord 2022-6-6

在 MATLAB Online 中打开

Edric can correct me if I'm wrong with this example but let's say you had about the most basic persistent variable setup that there is.

function y = myPersistentStorage(x)
% Untested code
persistent y
if nargin == 1
    y = x;
end

If you were to call this in a parfor loop with the loop variable value as input and if you could use persistent variables in a parfor, what would the value of y be in myPersistentStorage after the loop exited?

parfor k = 1:100
    myPersistentStorage(k); % Update y
end
q = myPersistentStorage; % Retrieve the last value of y

Remember, parfor loop bodies must be order independent. The loop bodies could be executed in any order. So q would not necessarily be 100. It would not necessarily be 1. It could be 2, 99, or "The Answer" of 42.

Edric Ellis 2022-6-6

The constraint on parfor loops being "order independent" is not really possible to enforce, other than at a syntactic level for the loop body itself. Use of persistent is just one way that you could subvert that. I agree that in your example, if somehow the persistent value was brought back to the client after the loop, it could have any value between 1 and 100.

请先登录，再进行评论。

Parallel pool on function that uses persistent variables

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（1 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

类别

标签

Community Treasure Hunt

Parallel pool on function that uses persistent variables

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

采纳的回答

3 个评论 显示 1更早的评论 隐藏 1更早的评论

更多回答（1 个）

3 个评论 显示 1更早的评论 隐藏 1更早的评论

类别

标签

另请参阅

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论