主要内容

Compute Parallel Prefix Scans with SPMD Computations

This example shows how to implement a reusable, scalable, SPMD based prefix scan using a parallel workers. You also apply the implementation to two workflows: a cumulative sum on distributed arrays and a stateful scan that tracks both a running maximum and its index for line of sight (LOS) analysis.

A prefix scan computes cumulative results over a sequence using an associative binary operator and its identity element, for example plus with identity 0, or times with identity 1. MATLAB® provides distributed array support for common cumulative functions such as cumsum, cummin, cummax, and cumprod. For other operators or when you need to carry extra state, you can implement a scan using an spmd block and Composite arrays.

Understand the Algorithm

You can use Composite arrays and spmd to perform scalable inclusive or exclusive prefix scans across workers. This algorithm is based on a description provided in Blelloch[1].

First, distribute the array among the workers in the parallel pool and extract the local segment of the array on each worker . You work with the arrays on the worker as Composites. Then perform these steps:

  1. Local scan and block aggregates: Each worker performs an inclusive scan over its local block with the associative operator. The last local scan value is the block aggregate.

  2. Reduction: Workers exchange block aggregates using the reduction operation function spmdCat.

  3. Per-worker offsets: Each worker computes an exclusive prefix over the block aggregates to form a per-worker offset.

  4. Apply offsets: Each worker applies its offset to every element of the local scan.

For an inclusive scan, the current element is included in the local scan before offsets. For an exclusive scan, you locally shift the scan by one using the identity, otherwise the offset logic is unchanged.

Define Parallel Prefix Scan Function

Write a reusable function that scans column inputs (d-by-N) for either the inclusive or exclusive scan mode and preserves the input distribution. The parallelPrefixScan helper function defined below computes a prefix scan of column elements in x using an binary operator operatorFcn and identity values for the operator identityVal. You can specify the scanMode as either "inclusive" or "exclusive". The operator must be associative and id must be the correct identity for the operator function. x can have more than one rows when the binary operator accepts more than one input arguments. The function accepts local or distributed arrays.

function y = parallelPrefixScan(x,operatorFcn,identityVal,scanMode)
%   y = parallelPrefixScan(x,operatorFcn,identityVal,scanMode)
%   Inputs:
%     x           : d-by-N local or distributed. 
%                   Each column is a d-by-1 input.
%     operatorFcn : function handle for an associative binary operator
%     identityVal : d-by-1 identity for operatorFcn, 
%                   for example, 0 or [-inf; 0]).
%     scanMode    : "inclusive" or "exclusive".
%   Output:
%     y           : d-by-N, with the same distribution
%                   and codistributor as x.
%

inputWasDistributed = isdistributed(x);
if ~inputWasDistributed
% Default distribution along the last non-singleton dimension
    x = distributed(x);  
end

spmd
    localInput = getLocalPart(x);

    % 1) Local scan and aggregate
    localScan = localInput;
    for k = 2:size(localInput,2)
        localScan(:,k) = operatorFcn(localScan(:,k-1),localInput(:,k));
    end
    localAggregate = localScan(:,end);

    if strcmp(scanMode,"exclusive")
        localScan = [identityVal localScan(:,1:end-1)];
    end

    % 2) Concatenate block aggregates in worker index order as columns
    % Each worker receives a d-by-NumWorkers matrix (columns)
    workerAggregates = spmdCat(localAggregate,2);

    % 3) Compute this worker's exclusive scan of previous aggregates
    prefixOffset = identityVal;
    for p = 1:spmdIndex-1
        prefixOffset = operatorFcn(prefixOffset,workerAggregates(:,p));
    end

    % 4) Apply the offset to the local scan
    localScanOffset = localScan;
    for k = 1:size(localScan,2)
        localScanOffset(:,k) = operatorFcn(prefixOffset,localScan(:,k));
    end

    % Reassemble distributed result
    codist = getCodistributor(x);
    yDistributed = codistributed.build(localScanOffset,codist);
end

if ~inputWasDistributed
    y = gather(yDistributed);
else
    y = yDistributed;
end
end

Compute Prefix Sum on Distributed Array

You can perform any prefix scan operation by applying any binary function with the parallelPrefixScan helper function. For example, compute an inclusive cumulative sum on a distributed array using the custom parallelPrefixScan function.

Create a distributed array of random integers.

scanInput = randi([1 1000],1,1E5,"distributed");

Call parallelPrefixScan helper function with:

  • @plus as the operator for addition.

  • 0 as the identity element. Adding zero does not change the sum.

  • "inclusive" as the scan mode.

scanResult = parallelPrefixScan(scanInput,@plus,0,"inclusive");

The output y contains cumulative sums across all elements of x. Verify correctness by comparing y to the results from the built-in cumsum function. The message confirms that the custom scan matches the built-in function.

yTest = cumsum(scanInput);
if max(abs(scanResult-yTest)) < 1e-9
    disp("Passed prefix sum check");
end
Passed prefix sum check

Track Values and Indices Using a Custom Prefix Scan Operator

To solve domain-specific problems, it can be useful to compute cumulative values across an array and also identify the index where each value occurs. Simple scan operators like sum typically handle values only, so to track both values and their positions, you need a custom operator. This example shows how to track the cumulative maximum and its index using a custom operator, and applies the technique to a line-of-sight (LOS) problem on a synthetic terrain profile.

Define Custom Operator

Define a custom operator maxLocOp that compares two pairs of [value,index] and returns the pair with the larger value, breaking ties by keeping the left operand to preserve earliest index.

function c = maxLocOp(a,b)
if (a(1) > b(1)) || (a(1) == b(1))
    c = a;
elseif a(1) < b(1)
    c = b;
end
end

Solve Line-of-Sight Problem

The line-of-sight problem involves a terrain map represented as a one-dimensional profile of altitudes sampled along a ray. Given an observer at a specific height, determine whether specific points along the terrain are visible from the observation point. A point is visible if its elevation angle is greater than all previous angles along the ray. To check this efficiently, compute a running maximum of elevation angles and compare each point against it.

To start, define the terrain by specifying the grid size, the spacing between samples, and the observer height. Then, generate a simple terrain profile using the generateTerrain helper function provided with this example.

numSamples = 1E5;
sampleSpacing = 50.0;  
observerHeight = 0;
[dist,elev] = generateTerrain(numSamples,sampleSpacing,observerHeight);

Compute the slopes of the angles by dividing the elevation difference by the distance.

slopeTan = (elev-observerHeight)./dist;

Define the vector to track positions along the terrain, and concatenate it with the tangent vector to generate the input to the parallelPrefixScan helper function.

sampleIdxs = distributed(1:numSamples);
scanInput = [slopeTan;sampleIdxs];

Apply the parallelPrefixScan helper function with the maxLocOp custom operator and the identity [-inf;0]. The identity pair ensures that the initial maximum slope is negative infinity, and the initial index is zero, so the first comparison always updates correctly.

id = [-inf;0];
scanResult = parallelPrefixScan(scanInput,@maxLocOp,id,"inclusive");

The scan returns a 2-by-numSamples matrix, where the first row tracks the maximum slope encountered so far and the second row tracks the position where that maximum occurred.

runningMaxSlopeTan = scanResult(1,:);
idxRunningMaxSlope = scanResult(2,:);

Use these results to determine the visibility status of targets along the terrain.

Sample three evenly spaced points from the last 90% of the position vector.

numTargets = 3;
targetIdxs = round(linspace(numSamples*0.1,numSamples,numTargets));

For each target, compute and gather:

  • The slope from the observer to the target, targetSlopeTan.

  • The maximum slope encountered at the target, maxSlopeTanAtTarget.

  • The position where that maximum slope occurred, maxSlopeIdxAtTarget.

targetSlopeTan = gather(slopeTan(targetIdxs));
maxSlopeTanAtTarget = gather(runningMaxSlopeTan(targetIdxs));
maxSlopeIdxAtTarget = gather(idxRunningMaxSlope(targetIdxs));

A target is visible if its slope is larger than or equal to the running maximum slope at its position.

isVisible = targetSlopeTan >= maxSlopeTanAtTarget;

If the target is blocked, the obstruction point is at maxSlopeIdxAtTarget, where the maximum slope occurred. This point lies on terrain that rises above the line of sight to the target. Compute obstruction points and slope.

obstructionPoint = maxSlopeIdxAtTarget;
obstructionSlopeTan = maxSlopeTanAtTarget;

Plot the terrain and the visibility of the targets using the plotLOSTargets helper function. The plot shows the terrain profile, the observer, targets, and any obstruction points. Green markers indicate visible targets with solid LOS lines. Red markers indicate blocked targets, with obstruction points marked by diamonds and dashed lines showing the blocking LOS. You can confirm that only two targets are visible and identify the exact terrain points causing the obstruction. The plotLOSTargets function is attached to the example as a supporting file.

plotLOSTargets(dist,elev,observerHeight,targetIdxs,isVisible, ...
    obstructionPoint,obstructionSlopeTan);

References

[1] Blelloch, Guy E. Prefix Sums and Their Applications. Technical Report CMU-CS-90-190. School of Computer Science, Carnegie Mellon University, 1990. https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf

See Also

| |

Topics