How to use RandStreams appropriately with Parallel Computing?
32 次查看(过去 30 天)
显示 更早的评论
I am currently working to update an existing set of code for reproducibility.
Currently, the code is structured as follows:
nlabs = 6;
seed = 1; % User-choice
[globalstream, labstreams{1:nlabs}] = RandStream.create('mrg32k3a','NumStreams',nlabs+1,'Seed',seed);
RandStream.setGlobalStream( globalstream );
parallelpool=parpool(nlabs);
spmd
RandStream.setGlobalStream( labstreams{spmdIndex} );
end
parfor i=1:nlabs
Calculations here
end
However, I need the code to be fully reproducible. I understand that to achieve reproducibility with parallel computing I need to use substreams ( https://www.mathworks.com/help/stats/reproducibility-in-parallel-statistical-computations.html ). However I am not confident of how to distinguish the global stream and worker stream.
I've seen an example in which the user used only a single global stream by storing and retreiving the stream state before and after the parfor loop ( https://www.mathworks.com/matlabcentral/answers/1670009-reproducible-and-independent-random-stream-generation-in-parfor-loop ) but it seems like it would be simpler to setup two independent streams.
I've outlined a two-stream setup below. Does this seem reasonable? I want globalstream and each substream of labstream to be independent.
nlabs = 6;
seed = 1; % User-choice
[globalstream, labstream] = RandStream.create('mrg32k3a','NumStreams',2,'Seed',seed);
RandStream.setGlobalStream( globalstream );
<Some Calculations>
parallelpool=parpool(nlabs);
parallel.pool.Constant(RandStream.setGlobalStream(labstream)) % Not sure of the syntax here
parfor i=1:nlabs
set(labstream,'Substream',i)
<Some Calculations>
end
RandStream.setGlobalStream( globalstream );
<Some Calculations>
0 个评论
回答(1 个)
Daemon
2026-1-27,13:22
To achieve full reproducibility in parallel MATLAB code, it is essential to separate client-side (global) random number generation from worker-side random number generation, and to ensure that no RandStream object is shared across workers. While substreams are the correct mechanism for reproducible parallel execution, a single stream cannot be safely mutated inside a parfor loop, as execution order is undefined and leads to non-deterministic results.
Use one stream on the client for all serial computations. This stream is independent of any parallel execution.
seed = 1;
clientStream = RandStream('mrg32k3a','Seed',seed);
RandStream.setGlobalStream(clientStream);
% Client-side computations
A = rand(1,10);
Each worker must have its own stream instance. This is done using parallel.pool.Constant, which constructs a separate RandStream on each worker with the same seed.
nlabs = 6;
parpool(nlabs);
workerStreams = parallel.pool.Constant(@() ...
RandStream('mrg32k3a','Seed',seed));
This avoids sharing stream handles and guarantees deterministic initialization on every run.
Inside the parfor loop, assign a substream based on the loop index and set it as the worker’s global stream before generating random numbers.
parfor i = 1:nlabs
s = workerStreams.Value;
s.Substream = i; % Deterministic mapping
RandStream.setGlobalStream(s);
% Parallel computations
x = rand(1,5);
end
With mrg32k3a, substreams are independent and ordered, so results are reproducible regardless of scheduling or execution order.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!