主要内容

Control and Repeat Random Numbers in Parallel Jobs

This example shows how to control random number generation for independent parallel jobs and tasks by using a unique substream for each job.

As described in Control Random Number Streams on Workers, each worker in a cluster working on the same job has an independent random number generator stream. For jobs and tasks, MATLAB® resets the random number generator on each worker that runs the job to the default algorithm and seed state. It then assigns an independent stream based on the task index.

Random Numbers in Batch Jobs

When you submit an independent batch job using the batch function, you create a job with one task and the task always has a task index of 1. Because MATLAB assigns a random number stream based on the task index, a batch job gets the same stream regardless of which worker executes it. As a result, a batch job that runs a command such as rand(1,4) returns the same result every time you execute it.

Use rand to generate random numbers as a batch job on a worker on the local machine.

c = parcluster("Processes");
job1 = batch(c,@rand,1,{1,4});
job2 = batch(c,@rand,1,{1,4});

To block MATLAB until the job finishes, use the wait function on the job objects. Retrieve the results of the batch jobs. The random numbers from both batch jobs are the same.

wait(job1);
wait(job2);
fetchOutputs(job1)
ans = 1×1 cell array
    {[0.1349 0.6744 0.9301 0.5332]}

fetchOutputs(job2)
ans = 1×1 cell array
    {[0.1349 0.6744 0.9301 0.5332]}

Control Random Number Streams with Substreams

To generate unique random number sequences across multiple batch jobs, assign a specific substream to each job. This ensures independence across jobs and tasks. To reproduce results, reuse the same job index.

This helper function shows how to modify the global stream using a job index you specify. It extracts a handle to the global random number stream on the worker and sets the substream index based on the specified job index.

function r = modifyStreamJobFcn(jobIdx,sz)
stream = RandStream.getGlobalStream;
stream.Substream = jobIdx;

% Start job function
r = rand(1,sz);
end

Run the modifyStreamJobFcn helper function as two batch jobs.

for idx = 1:2
    jobIdx = idx;
    batchJobs(idx) = batch(c,@modifyStreamJobFcn,1,{jobIdx,4});
end

wait(batchJobs(1));
wait(batchJobs(2));

Retrieve the results from the workers. The random numbers from the batch jobs are now different.

batchJob1Result = fetchOutputs(batchJobs(1))
batchJob1Result = 1×1 cell array
    {[0.1349 0.6744 0.9301 0.5332]}

batchJob2Result = fetchOutputs(batchJobs(2))
batchJob2Result = 1×1 cell array
    {[0.3270 0.8665 0.6173 0.6411]}

Reproduce Results Using Job Index

To reproduce results from the job, reuse the job index. Recreate the results for the batch job with an index of 2. The random numbers generated in batchJobs2 and batchJobs3 are the same.

jobIdx = 2;
batchJobs(3) = batch(@modifyStreamJobFcn,1,{jobIdx,4});
wait(batchJobs(3));
batchJob3Result = fetchOutputs(batchJobs(3))
batchJob3Result = 1×1 cell array
    {[0.3270 0.8665 0.6173 0.6411]}

isequal(batchJob2Result,batchJob3Result)
ans = logical
   1

Random Numbers in Jobs and Tasks

When you independent jobs with multiple tasks with the createJob and createTask functions, each task gets an independent stream based on the task index. Because this mapping is deterministic, tasks with the same task index get the same stream on every job, regardless of which worker executes it.

Create two jobs, each with 4 tasks. For each task, generate a random number. Wait for the jobs to complete and retrieve the results. The random numbers each task generates are the same for both jobs.

for idx = 1:2
    multiTasksJob(idx) = createJob(c);
    for t = 1:4
        createTask(multiTasksJob(idx),@rand,1,{1});
    end
    submit(multiTasksJob(idx));
end
wait(multiTasksJob(1));
wait(multiTasksJob(2));
fetchOutputs(multiTasksJob(1))'
ans=1×4 cell array
    0.1349    0.6383    0.9730    0.3241

fetchOutputs(multiTasksJob(2))'
ans=1×4 cell array
    0.1349    0.6383    0.9730    0.3241

To reproduce the same set of random numbers for each task each time you run a job, use the default behavior.

Control Random Number Stream with Substreams

If you want to generate unique random number sequences across a set of independent jobs with multiple tasks, you can modify the global random number stream by assigning a particular substream to each job. All random numbers generated across the jobs are independent. To reproduce specific results from a previous job, you can use the job index to reassign the same substream.

Define a helper function that modifies the global stream using a job index you specify. For comparison, also return the stream details.

function out = modifyAndReturnStreamJobFcn(jobIdx,taskIdx,sz)
stream = RandStream.getGlobalStream;
stream.Substream = jobIdx;
% Collect stream details
out.JobNum = jobIdx;
out.TaskNum = taskIdx;
out.rngStream = stream.StreamIndex;
out.rngSubstream = stream.Substream;

% Start job function
r = rand(1,sz);
out.result = r;
end

Create two jobs again, each with 4 tasks. For each task, run the modifyAndReturnStreamJobFcn helper function to generate a random number. Wait for the jobs to complete.

for idx = 1:2
    multiTasksJob(idx) = createJob(c);
    jobIdx = idx;
    for taskIdx = 1:4
        createTask(multiTasksJob(idx), ...
            @modifyAndReturnStreamJobFcn,1,{jobIdx,taskIdx,1});
    end
    submit(multiTasksJob(idx));
end
wait(multiTasksJob(1));
wait(multiTasksJob(2));

Retrieve the results from both jobs. The random numbers the tasks generate are now different for both jobs.

multiTasksJob1Result = cell2mat(fetchOutputs(multiTasksJob(1)));
[multiTasksJob1Result.result]
ans = 1×4

    0.1349    0.6383    0.9730    0.3241

multiTasksJob2Result = cell2mat(fetchOutputs(multiTasksJob(2)));
[multiTasksJob2Result.result]
ans = 1×4

    0.3270    0.4821    0.8265    0.2793

Display details about the stream and substream indices for each task in a table. Group tasks with the same task index together.

T = struct2table([multiTasksJob1Result;multiTasksJob2Result]);
T = sortrows(T,"TaskNum");
disp(T)
    JobNum    TaskNum    rngStream    rngSubstream    result 
    ______    _______    _________    ____________    _______

      1          1           2             1          0.13486
      2          1           2             2          0.32701
      1          2           4             1          0.63835
      2          2           4             2          0.48211
      1          3           6             1            0.973
      2          3           6             2          0.82646
      1          4           8             1          0.32412
      2          4           8             2          0.27926

Reproduce Results Using Job Index

To reproduce results from a job with multiple tasks, reuse the job index. Recreate the results for the job with an index of 2. The random numbers generated in multiTasksJob2 and multiTasksJob3 are the same.

jobIdx = 2;
multiTasksJob(3) = createJob(c);
for taskIdx = 1:4
    createTask(multiTasksJob(3), ...
        @modifyAndReturnStreamJobFcn,1,{jobIdx,taskIdx,1});
end
submit(multiTasksJob(3));
wait(multiTasksJob(3));

multiTasksJob3Result = cell2mat(fetchOutputs(multiTasksJob(3)));
[multiTasksJob2Result.result]
ans = 1×4

    0.3270    0.4821    0.8265    0.2793

[multiTasksJob3Result.result]
ans = 1×4

    0.3270    0.4821    0.8265    0.2793

Define Custom Random Streams

You can use custom random number streams to control random number generation for jobs and tasks. This approach gives you full control over the generator algorithm, seed, and stream configuration.

Additionally, modifying the global stream on the workers does not guarantee the same results between different MATLAB releases, which might use different default algorithms and seeds. If you want results to remain consistent across releases, define a custom random number stream for each job and task.

Create a custom stream using the RandStream.create function and a generator that supports substreams. For a list of generators that support substreams, see Choosing a Random Number Generator.

To ensure that each stream is independent and repeatable, follow these steps at the start of your job function:

  • Use the same generator algorithm, seed, and number of streams.

  • Assign a unique stream index for each job and a unique substream for each task.

For example, this function creates a random stream based on the specified job index, modifies the substream using the task ID, and assigns the modified stream as the global stream on the worker.

function r = createStreamJobFcn(jobIdx,taskIdx,sz)
s = RandStream.create("threefry4x64_20",Seed=0, ...
    NumStreams=2^63,StreamIndices=jobIdx);
s.Substream = taskIdx;
RandStream.setGlobalStream(s);

% Start job function
r = rand(1,sz);
end

Automate Stream Setup with taskStartup.m

The methods described above are useful for modifying the global stream for a specific set of jobs. If you want to do this for all jobs you run but do not want to modify the job function every time, then add the code to modify the global stream to your task startup function file taskStartup.m.

The taskStartup.m file runs automatically on a worker each time the worker runs a task for a job. You must ensure that the taskStartup.m file is picked up by the executing task. For information about the taskStartup.m file, see taskStartup.

For example, to ensure that each job uses a unique global stream, add this code to the taskStartup.m file:

function taskStartup(task)
job = task.Parent;
jobID = job.ID;
taskID = task.ID;
s = RandStream.create("threefry4x64_20",Seed=0, ...
    NumStreams=2^63,StreamIndices=jobID);
s.Substream = taskID;
RandStream.setGlobalStream(s);
end

See Also

|

Topics