limiting broadcast variable in a 3d block processing program

3 次查看(过去 30 天)
I have some 3D or 4D arrays that I want to run a block processing on. To do this I create a cell object that I can call that returns matricies of position values.
For example, ind_object{1} = [1 2 3 ; 1 2 3; 1 2 3].
I then call this index object in a parfor loop to pull out blocks of my image (sz_block(n) is the size of the block, on this example [3,3,3])
parfor n=1:numel(index_object)
block_im = pad_im(index_object{n}(1,1:sz_block(1)),index_object{n}(2,1:sz_block(2)),index_object{n}(3,1:sz_block(3)),:);
out(n) = run_some_function(block_im);
end
The problem with this is that my "pad_im" array ends up being a broadcast variable and it's rather large 512x512x300 so the memory swapping really slows down the calculation.
Does anyone have a reccomendation to get rid of the broadcasting of the pad_im array?
Or does anyone know how other programs (like C/C++ or python) do n-dimensional block processing?
Thanks for the info!

采纳的回答

Tejas
Tejas 2024-8-20
Hello Tiwwexx,
To stop the broadcasting of the pad_im array, consider making it a Distributed Array. This will partition the data so it can be utilized across multiple available workers. A 'spmd(Single Program Multiple Data) block can then be used to enable parallel execution of code across multiple workers. Each worker will run the same code independently but will process different parts of the data. Please note that this approach requires the Parallel Computing Toolbox.
Below is a comparison of the time taken by two methods: the first uses a parfor loop, and the second uses spmd.
Using_parfor_loop.m
dim1 = 512;
dim2 = 512;
dim3 = 300;
pad_im = rand(dim1, dim2, dim3);
sz_block = [3, 3, 3];
% Create an index object for block processing
index_object = cell(1, 10);
for i = 1:10
index_object{i} = [randi([1, dim1-sz_block(1)+1], 1, 3);
randi([1, dim2-sz_block(2)+1], 1, 3);
randi([1, dim3-sz_block(3)+1], 1, 3)];
end
out = zeros(1, numel(index_object));
tic;
parfor n = 1:numel(index_object)
% Extract block from pad_im using the indices from index_object
block_im = pad_im(index_object{n}(1,1):index_object{n}(1,1)+sz_block(1)-1, ...
index_object{n}(2,1):index_object{n}(2,1)+sz_block(2)-1, ...
index_object{n}(3,1):index_object{n}(3,1)+sz_block(3)-1);
out(n) = sum(block_im(:)); % Calculate the sum of all elements in the block
end
elapsedTime = toc;
fprintf('Elapsed time for block processing: %.2f seconds\n', elapsedTime);
Using_spmd
% Check if Parallel Computing Toolbox is available
if ~license('test', 'Distrib_Computing_Toolbox')
error('Parallel Computing Toolbox is required for distributed arrays.');
end
dim1 = 512;
dim2 = 512;
dim3 = 300;
% Create a random distributed array
pad_im = distributed.rand(dim1, dim2, dim3);
sz_block = [3, 3, 3];
num_blocks = 10;
out = zeros(1, num_blocks);
% Some global sample indexes
index_object = cell(1, num_blocks);
for i = 1:num_blocks
index_object{i} = [randi([1, dim1 - sz_block(1) + 1]);
randi([1, dim2 - sz_block(2) + 1]);
randi([1, dim3 - sz_block(3) + 1])];
end
tic;
% Use spmd to distribute the work
spmd
local_pad_im = getLocalPart(pad_im); % Get the local part of the distributed array
local_size = size(local_pad_im); % Size of the local part
local_out = zeros(1, num_blocks); % Local output for each worker
% Determine the global start indices for this worker's local part
global_start_idx1 = globalIndices(pad_im, 1);
global_start_idx2 = globalIndices(pad_im, 2);
global_start_idx3 = globalIndices(pad_im, 3);
for n = 1:num_blocks
% Extract global indices for the block
global_idx1 = index_object{n}(1);
global_idx2 = index_object{n}(2);
global_idx3 = index_object{n}(3);
% Convert global indices to local indices if they belong to the local part
local_idx1 = global_idx1 - global_start_idx1(1) + 1;
local_idx2 = global_idx2 - global_start_idx2(1) + 1;
local_idx3 = global_idx3 - global_start_idx3(1) + 1;
% Ensure indices are within local bounds
if local_idx1 > 0 && local_idx1 + sz_block(1) - 1 <= local_size(1) && ...
local_idx2 > 0 && local_idx2 + sz_block(2) - 1 <= local_size(2) && ...
local_idx3 > 0 && local_idx3 + sz_block(3) - 1 <= local_size(3)
% Extract block from local_pad_im using the local indices
block_im = local_pad_im(local_idx1:local_idx1+sz_block(1)-1, ...
local_idx2:local_idx2+sz_block(2)-1, ...
local_idx3:local_idx3+sz_block(3)-1);
local_out(n) = sum(block_im(:));
end
end
out = spmdPlus(local_out);
end
elapsedTime = toc;
fprintf('Elapsed time for block processing: %.2f seconds\n', elapsedTime);
Kindly refer to the following documentations to get more information on spmd and ‘Distributed Arrays’ respectively:

更多回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by