Fast footpoint computation over polygonal lines

3 次查看(过去 30 天)
Problem definition
Consider a 2xm dataset matrix D encoding m 2-dimensional noisy observations. I want to project the dataset D over a 2xN open polygonal line V encoded by N 2-dimensional vertices. The challenge is to do it as fast as possible because both m and N can be very large (say in the order of 1e4).
My naive solution: step 1
Clearly, the main problem amounts to project each single observation contained in D over the line V. Hence, consider for the moment the problem of projecting a single observation over V. Since the line V is nothing but more than a set of N-1 lines, we can further simplify the projection problem to the elementary one of projecting a single observation over a single segment. My solution is the following:
function [dist, pfoot] = footPoint_MATTEO(p, V1, V2)
pproj = NaN * ones(2, 3);
dproj = NaN * ones(1, 3);
alpha = ((p - V1)' * (V2 - V1))/ (norm(V2 - V1)^2);
if alpha < 1 && alpha > 0
pproj(:, 1) = (1 - alpha) * V1 + alpha * V2;
dproj(1) = norm(p - pproj(:, 1));
end
pproj(:, 2) = V1;
dproj(2) = norm(p - pproj(:, 2));
pproj(:, 3) = V2;
dproj(3) = norm(p - pproj(:, 3));
[~, istar] = min(dproj, [], 'omitnan');
dist = dproj(istar);
pfoot = pproj(:, istar);
end
here p is the single observation, V1 and V2 are the two vertices of the segment, dist is the distance between p and the segment V1,V2, pfoot is the projection of p over the line V1,V2. The rational is the following: pfoot is one of the two vertices or a point between them. If it is a point between them, then it can be written as convex combination with weight alpha in the interval (0,1). The value of alpha is a simple consequence of standard geometry. Finally, to choose pfoot between the three possible cases we minimize the distance.
My naive solution: step 2
Now, to get the general solution, we have just to repeat the previous procedure for each lines in the curve V and each observation in the dataset D. My solution is thus the following
function [Ddist, Dfoot] = shapeFootPoint_MATTEO(D, V)
N = size(V, 2);
m = size(D, 2);
Dfoot = NaN * ones(2, m);
Ddist = NaN * ones(1, m);
for j = 1 : m
dist = NaN * ones(1, N);
pfoot = NaN * ones(2, N);
for i = 1 : N - 1
[dist(i), pfoot(:, i)] = footPoint_MATTEO(D(:, j), V(:, i), V(:, i + 1));
end
[~, istar] = min(dist);
Dfoot(:, j) = pfoot(:, istar);
Ddist(j) = norm(Dfoot(:, j) - D(:, j));
end
end
here Ddist, Dfoot are respectively the relative list of distances and the projected dataset. Note that each single observation is projected over each segments and after that the nearest projection is selected as the definitive projection.
An attempt to cut down the execution time
Clearly, in shapeFootPoint_MATTEO both for cycles can be parallelized. For example, parallelizing the segment cycle gives the following solution
function [Ddist, Dfoot] = fastShapeFootPoint_MATTEO(D, V)
N = size(V, 2);
m = size(D, 2);
Dfoot = NaN * ones(2, m);
Ddist = NaN * ones(1, m);
Vnext = V(:, 2 :end);
for j = 1 : m
dist = NaN * ones(1, N);
pfoot = NaN * ones(2, N);
Dj = D(:, j);
parfor i = 1 : N - 1
[dist(i), pfoot(:, i)] = footPoint_MATTEO(Dj, V(:, i), Vnext(:, i));
end
[~, istar] = min(dist);
Dfoot(:, j) = pfoot(:, istar);
Ddist(j) = norm(Dfoot(:, j) - D(:, j));
end
end
Test drive
In order to test the performance of the previous solutions, consider the following dummy script
clc
clear
close all
m = 100;
N = 30;
D = rand(2, m);
V = rand(2, N);
tic
shapeFootPoint_MATTEO(D, V);
toc
tic
fastShapeFootPoint_MATTEO(D, V);
toc
My laptop gives the following results:
Elapsed time is 0.021693 seconds.
Elapsed time is 2.908765 seconds.
Hence, the parallelized version is much more slower than the original one.
Questions
  1. Is my naive solution already optimized? If not, how do we reduce the computational cost / improve the code implementation?
  2. Given the same dataset D, let's say that we want to repeat the process over multiple polygonal lines V. Naively, this means that we need to consider an additional for loop over the function ShapeFootPoint_MATTEO. In that case, how do we speed up the computation? Can we use the same principles to answer question 1?

回答(1 个)

Vinayak
Vinayak 2024-4-16
Hi Matteo,
As you might know, it is not always the best to use parallel computing, especially for small datasets due to high overheads. You can optimize by “vectorization” and “preallocation”. I have refactored your code for the function “vecShapeFootPoint” by using vectorization to remove the “footPoint” calculation, thus reducing a loop:
function [Ddist, Dfoot] = vecShapeFootPoint(D, V)
N = size(V, 2) - 1; % Number of segments
M = size(D, 2); % Number of points in D
Dfoot = NaN(2, M);
Ddist = NaN(1, M);
% Precompute values
Vnext = V(:, 2:end);
Vprev = V(:, 1:end-1);
Segments = Vnext - Vprev;
SegmentLengthsSquared = sum(Segments.^2, 1);
for j = 1:M
Dj = D(:, j); % Current point in D
% Compute alpha for all segments simultaneously
alpha = sum((Dj - Vprev) .* Segments, 1) ./ SegmentLengthsSquared;
% Correct alpha values to lie within [0, 1]
alpha = max(0, min(1, alpha));
% Calculate potential footpoints for all segments
pfoot = Vprev + Segments .* alpha;
% Calculate distances from Dj to all potential footpoints
distances = sqrt(sum((Dj - pfoot).^2, 1));
% Find the segment with the minimum distance for this point
[minDist, idx] = min(distances);
Ddist(j) = minDist;
Dfoot(:, j) = pfoot(:, idx);
end
end
On my machine, this runs way faster than original function.
Elapsed time is 0.012771 seconds.
Elapsed time is 0.000642 seconds.
Also, if you have multiple polygons as you mentioned in second part of your question, it might make sense to use a parallel for loop(parfor) for the outermost loop, where different “V” values are processed on different threads.
I hope this helps.

类别

Help CenterFile Exchange 中查找有关 Parallel for-Loops (parfor) 的更多信息

产品


版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by