How to use RandStreams appropriately with Parallel Computing?

32 次查看(过去 30 天)
I am currently working to update an existing set of code for reproducibility.
Currently, the code is structured as follows:
nlabs = 6;
seed = 1; % User-choice
[globalstream, labstreams{1:nlabs}] = RandStream.create('mrg32k3a','NumStreams',nlabs+1,'Seed',seed);
RandStream.setGlobalStream( globalstream );
parallelpool=parpool(nlabs);
spmd
RandStream.setGlobalStream( labstreams{spmdIndex} );
end
parfor i=1:nlabs
Calculations here
end
However, I need the code to be fully reproducible. I understand that to achieve reproducibility with parallel computing I need to use substreams ( https://www.mathworks.com/help/stats/reproducibility-in-parallel-statistical-computations.html ). However I am not confident of how to distinguish the global stream and worker stream.
I've seen an example in which the user used only a single global stream by storing and retreiving the stream state before and after the parfor loop ( https://www.mathworks.com/matlabcentral/answers/1670009-reproducible-and-independent-random-stream-generation-in-parfor-loop ) but it seems like it would be simpler to setup two independent streams.
I've outlined a two-stream setup below. Does this seem reasonable? I want globalstream and each substream of labstream to be independent.
nlabs = 6;
seed = 1; % User-choice
[globalstream, labstream] = RandStream.create('mrg32k3a','NumStreams',2,'Seed',seed);
RandStream.setGlobalStream( globalstream );
<Some Calculations>
parallelpool=parpool(nlabs);
parallel.pool.Constant(RandStream.setGlobalStream(labstream)) % Not sure of the syntax here
parfor i=1:nlabs
set(labstream,'Substream',i)
<Some Calculations>
end
RandStream.setGlobalStream( globalstream );
<Some Calculations>

回答(1 个)

Daemon
Daemon 2026-1-27,13:22
To achieve full reproducibility in parallel MATLAB code, it is essential to separate client-side (global) random number generation from worker-side random number generation, and to ensure that no RandStream object is shared across workers. While substreams are the correct mechanism for reproducible parallel execution, a single stream cannot be safely mutated inside a parfor loop, as execution order is undefined and leads to non-deterministic results.
Use one stream on the client for all serial computations. This stream is independent of any parallel execution.
seed = 1;
clientStream = RandStream('mrg32k3a','Seed',seed);
RandStream.setGlobalStream(clientStream);
% Client-side computations
A = rand(1,10);
Each worker must have its own stream instance. This is done using parallel.pool.Constant, which constructs a separate RandStream on each worker with the same seed.
nlabs = 6;
parpool(nlabs);
workerStreams = parallel.pool.Constant(@() ...
RandStream('mrg32k3a','Seed',seed));
This avoids sharing stream handles and guarantees deterministic initialization on every run.
Inside the parfor loop, assign a substream based on the loop index and set it as the worker’s global stream before generating random numbers.
parfor i = 1:nlabs
s = workerStreams.Value;
s.Substream = i; % Deterministic mapping
RandStream.setGlobalStream(s);
% Parallel computations
x = rand(1,5);
end
With mrg32k3a, substreams are independent and ordered, so results are reproducible regardless of scheduling or execution order.
  1 个评论
Laura
Laura 2026-1-27,18:44
编辑:Laura about 19 hours 前
Thank you so much for your advice.
There is a nuance to your suggestion I want to be sure I understand.
Context: Based on my reading and tinkering, it seems that there are two alternative methods for creating random streams
Option 01: s1 and s2 are distinct, non-identical streams such that x is not equal to y
seed=1;
[s1,s2]=RandStream.create('mrg32k3a','NumStreams',2,'Seed',seed);
x=rand(s1,1)
y=rand(s2,1)
Option 02: s1 and s2 are distinct, identical streams such that z=w.
seed=1;
s3 = RandStream.create('mrg32k3a','Seed',seed);
s4 = RandStream.create('mrg32k3a','Seed',seed);
z=rand(s3,1)
w=rand(s4,1)
If I understand your suggestion correctly, you are proposing that I use a strategy analogous to option 02 so that each worker is given a distinct but identical stream. My initial concern was that I would lose idependence of calculations by using this method. However, it seems that substreams within a given stream are independent? So by using the combined technique of an identical stream for each worker with substream use indexed by the parfor index I am able to achieve independence in the calculations. Is this correct?
For reasons that I will clarify in a followup reply, I need to alter your syntax a bit.
Would the following be conceptually equivalent to what you propose?
% User options
seed = 1;
nlabs = 6;
% Create streams
makestream=@()RandStream('mrg32k3a','Seed',seed);
clientStream = makestream();
for i=1:nlabs
workerstreams{i}=makestream();
end
RandStream.setGlobalStream(clientStream);
% Client-side computations
A = rand(1,5)
parpool(nlabs);
spmd
RandStream.setGlobalStream(workerstreams{spmdIndex});
end
parfor i = 1:nlabs
s = RandStream.getGlobalStream
s.Substream = i; % Deterministic mapping
% Parallel computations
B(i,:) = rand(1,5)
end
The one potential issue I see with this method is that A = B(1,:), i.e. the clientstream is not distinct from the workerstreams. It seems like if I want independence in the calculations I need the clienttream to be distinct from the workerstreams? Assuming this is correct, would the following be an appropriate strategy update that incorporates the concept of your suggestion?
% User options
seedchoice = 1;
nlabs = 6;
% Create streams
makestream=@(seed)RandStream('mrg32k3a','Seed',seed);
clientStream = makestream(seedchoice);
for i=1:nlabs
workerstreams{i}=makestream(seedchoice+1); %Give the clientstream and workerstreams different seeds
end
RandStream.setGlobalStream(clientStream);
% Client-side computations
A = rand(1,5)
parpool(nlabs);
spmd
RandStream.setGlobalStream(workerstreams{spmdIndex});
end
parfor i = 1:nlabs
s = RandStream.getGlobalStream
s.Substream = i; % Deterministic mapping
% Parallel computations
B(i,:) = rand(1,5)
end

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息

产品


版本

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by