limiting broadcast variable in a 3d block processing program

Question

tiwwexx 2022-10-4

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1817320-limiting-broadcast-variable-in-a-3d-block-processing-program

回答： Tejas 2024-8-20

I have some 3D or 4D arrays that I want to run a block processing on. To do this I create a cell object that I can call that returns matricies of position values.

For example, ind_object{1} = [1 2 3 ; 1 2 3; 1 2 3].

I then call this index object in a parfor loop to pull out blocks of my image (sz_block(n) is the size of the block, on this example [3,3,3])

parfor n=1:numel(index_object)
    block_im = pad_im(index_object{n}(1,1:sz_block(1)),index_object{n}(2,1:sz_block(2)),index_object{n}(3,1:sz_block(3)),:);
    out(n) = run_some_function(block_im);
end

The problem with this is that my "pad_im" array ends up being a broadcast variable and it's rather large 512x512x300 so the memory swapping really slows down the calculation.

Does anyone have a reccomendation to get rid of the broadcasting of the pad_im array?

Or does anyone know how other programs (like C/C++ or python) do n-dimensional block processing?

Thanks for the info!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Tejas 2024-8-20

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1817320-limiting-broadcast-variable-in-a-3d-block-processing-program#answer_1501249

在 MATLAB Online 中打开

Hello Tiwwexx,

To stop the broadcasting of the ‘pad_im’ array, consider making it a ’Distributed Array’. This will partition the data so it can be utilized across multiple available workers. A 'spmd’(Single Program Multiple Data) block can then be used to enable parallel execution of code across multiple workers. Each worker will run the same code independently but will process different parts of the data. Please note that this approach requires the Parallel Computing Toolbox.

Below is a comparison of the time taken by two methods: the first uses a ‘parfor’ loop, and the second uses ‘spmd’.

Using_parfor_loop.m

dim1 = 512;  
dim2 = 512;  
dim3 = 300;  
pad_im = rand(dim1, dim2, dim3); 
sz_block = [3, 3, 3]; 
% Create an index object for block processing 
index_object = cell(1, 10); 
for i = 1:10 
    index_object{i} = [randi([1, dim1-sz_block(1)+1], 1, 3); 
        randi([1, dim2-sz_block(2)+1], 1, 3); 
        randi([1, dim3-sz_block(3)+1], 1, 3)]; 
end 
out = zeros(1, numel(index_object)); 
tic; 
parfor n = 1:numel(index_object) 
    % Extract block from pad_im using the indices from index_object 
    block_im = pad_im(index_object{n}(1,1):index_object{n}(1,1)+sz_block(1)-1, ... 
        index_object{n}(2,1):index_object{n}(2,1)+sz_block(2)-1, ... 
        index_object{n}(3,1):index_object{n}(3,1)+sz_block(3)-1); 
    out(n) = sum(block_im(:)); % Calculate the sum of all elements in the block 
end 
elapsedTime = toc; 
fprintf('Elapsed time for block processing: %.2f seconds\n', elapsedTime); 

Using_spmd

% Check if Parallel Computing Toolbox is available 
if ~license('test', 'Distrib_Computing_Toolbox') 
    error('Parallel Computing Toolbox is required for distributed arrays.'); 
end 
dim1 = 512;  
dim2 = 512;  
dim3 = 300; 
% Create a random distributed array 
pad_im = distributed.rand(dim1, dim2, dim3); 
sz_block = [3, 3, 3]; 
num_blocks = 10; 
out = zeros(1, num_blocks); 
% Some global sample indexes 
index_object = cell(1, num_blocks); 
for i = 1:num_blocks 
    index_object{i} = [randi([1, dim1 - sz_block(1) + 1]); 
        randi([1, dim2 - sz_block(2) + 1]); 
        randi([1, dim3 - sz_block(3) + 1])]; 
end 
tic; 
% Use spmd to distribute the work 
spmd 
    local_pad_im = getLocalPart(pad_im); % Get the local part of the distributed array 
    local_size = size(local_pad_im); % Size of the local part 
    local_out = zeros(1, num_blocks); % Local output for each worker 
    % Determine the global start indices for this worker's local part 
    global_start_idx1 = globalIndices(pad_im, 1); 
    global_start_idx2 = globalIndices(pad_im, 2); 
    global_start_idx3 = globalIndices(pad_im, 3); 
    for n = 1:num_blocks 
        % Extract global indices for the block 
        global_idx1 = index_object{n}(1); 
        global_idx2 = index_object{n}(2); 
        global_idx3 = index_object{n}(3); 
        % Convert global indices to local indices if they belong to the local part 
        local_idx1 = global_idx1 - global_start_idx1(1) + 1; 
        local_idx2 = global_idx2 - global_start_idx2(1) + 1; 
        local_idx3 = global_idx3 - global_start_idx3(1) + 1; 
        % Ensure indices are within local bounds 
        if local_idx1 > 0 && local_idx1 + sz_block(1) - 1 <= local_size(1) && ... 
                local_idx2 > 0 && local_idx2 + sz_block(2) - 1 <= local_size(2) && ... 
                local_idx3 > 0 && local_idx3 + sz_block(3) - 1 <= local_size(3) 
            % Extract block from local_pad_im using the local indices 
            block_im = local_pad_im(local_idx1:local_idx1+sz_block(1)-1, ... 
                local_idx2:local_idx2+sz_block(2)-1, ... 
                local_idx3:local_idx3+sz_block(3)-1); 
            local_out(n) = sum(block_im(:)); 
        end 
    end 
    out = spmdPlus(local_out);  
end 
elapsedTime = toc; 
fprintf('Elapsed time for block processing: %.2f seconds\n', elapsedTime); 

Kindly refer to the following documentations to get more information on ‘spmd’ and ‘Distributed Arrays’ respectively:

‘spmd’: https://www.mathworks.com/help/releases/R2021b/parallel-computing/spmd.html .
‘Distributed Arrays’: https://www.mathworks.com/help/releases/R2021b/parallel-computing/distributed-arrays.html .

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

limiting broadcast variable in a 3d block processing program

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

limiting broadcast variable in a 3d block processing program

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论