How to optimise a big loop with polyshape, intersection and area?

Hello everyone,
I have a matrix of latitudes 'lat' (size 2882 x 3601) and longitudes 'lon' (size 2882 x 3601). This is a MODIS grid, so the latitudes are parallel, but the longitudes aren't. The resulting grid cells aren't regular-shaped rectangles, and are approx. 250 x 250 m.
I also have a polyshape 'polyout' containing 1 huge (milions of vertices) polygon (with holes) showing a specific land cover type (forests).
A small example of how it looks like is in the picture:
.
green dots - coordinates (taken from the 'lat' and 'lon' matrices)
beige polygon - the 'polyout' forest land cover
What I need to do is: for each grid cell (where the green point is the upper left corner) calculate the percentage of the area covered by the forest. The result should be assigned to the 'poly_perc' matrix (size 2881 x 3600).
I tried to do this with the huge loop and a polyshape, intersection and area functions inside. Here is my code:
poly_perc=NaN(size(lat_layer-1,1),size(lat_layer-1,2));
for i=1:(size(lat,1)-1)
for j=1:(size(lat,2)-1)
c1=[lon(i,j),lon(i+1,j),lon(i+1,j+1),lon(i,j+1)];
c2=[lat(i,j),lat(i+1,j)lat(i+1,j+1),lat(i,j+1)];
pgon=polyshape(c1,c2);
int=intersect(pgon,polyout);
ar1=area(pgon);
ar2=area(int);
poly_perc(i,j)=ar2/ar1*100;
end
end
This is working, but it is working way to slow - it would take years to finish...
How can I optimise this to perform faster? Or maybe there is entirely different way to do it fast?
This is the first time I write a post on the forum, I hope I explained everything well. So far, I found thousands of good answers on this forum, hopefully this time will be the same!
Thank you!
Kinga

5 个评论

At first glance there are many methods to speed it up considerably.
For instance, you can eliminate a lot of the grid cells from your loop: the ones fully covered and the ones not coverd at all.
This can be done by a first boolean search: look for which vertices of the grid cells are inside the polyshape (using the matlab isinside function for example). Since you know the corner points of the grid cell you can deduce that if all points are inside then the grid cell is full covered and if none are inside then there is no coverage.
What's left are the grid cells which are partialy covered. There are multiple ways to seed this up, however we will need some exmaple data to provide a demonstration code.
Try to avoid the double loop by setting up the grid first. The single loop can then be called by parfor. And you can use this grid for the isinside function.
Since you know the corner points of the grid cell you can deduce that if all points are inside then the grid cell is full covered and if none are inside then there is no coverage.
That would be true if polyout were convex. It is more complicated to test whether a quadrilateral lies completely inside a non-convex polygon.
Instead, I suggest doing the same computation of poly_perc (though maybe with my more vectorized version below) but with the MODIS tiles aggregated initially into larger, lower resolution super-tiles. If a super-tile is found to have a 0 or a 100 percent coverage, then you know that its original sub-tiles will as well. You can then do a second pass at the original resolution, but looping only over the sub-tiles whose coverage is still uncertain.
I attached a piece of input data right now (I didn't know I could do this at first...). @Karim Could you suggest any solution right now?
Thanks!
The resulting grid cells aren't regular-shaped rectangles,
From the attached data, your cells are not regularly-shaped rectangles, but they are regularly-shaped parallelograms. Is that normal? If so, you could just perform a shear transformations on both the lat/lon lattice points and on polyout, such that the lattice would become a plain rectangular grid, which is much easier to deal with. The area intersections would be the same, because shears are area-preserving.
No, the data is not regularly-shaped parallelograms either. The grid cells are 250x250 m everywhere, while the Earth is a globe. So the „quasi-paralellograms” are the bigger the more to the north we are. Unfortunately.

请先登录,再进行评论。

 采纳的回答

The implementation below works by iterating over 25x25 sub-blocks of the lon/lat data. On my computer, each block takes about 2.5 seconds, which means the whole loop should run to completion in about 45 minutes. If you have the Parallel Computing Toolbox and replace the for loop with a parfor loop, it should take even less. I don't think there's going to be a way to reduce the run time to a few seconds, if that's what you were hoping.
load lat_lon
load polyout
k=25; %block length
[m,n]=size(lat);
mb=blockranges(m,k);
nb=blockranges(n,k);
[M,N]=deal(width(mb), width(nb));
fcn=@(z) [z(1:end-1,1)', z(end,:), flip(z(1:end-1,end))', flip( z(1,2:end-1) ) ]';
poly_perc =cell(M,N);
for blockNum=1:N*M
[i,j]=ind2sub([M,N],blockNum);
tic;
Lat=lat(mb(1,i):mb(2,i), nb(1,j):nb(2,j));
Lon=lon(mb(1,i):mb(2,i), nb(1,j):nb(2,j));
p=polyshape(fcn(Lon),fcn(Lat),'Simplify',0,'Keep',1);
Pforest=intersect( p , polyout );
T=toc;
if ~area(Pforest)
poly_perc{i,j}=zeros(size(Lat)-1);
disp("Time="+num2str(T,'%.2f')+" (No forest area in block)")
continue;
end
tic
Pgons=tiles2polyshape(Lon,Lat);
ar1=area(Pgons);
ar2=area(intersect(Pgons,Pforest));
poly_perc{i,j}=ar2./ar1*100;
T=T+toc;
disp("Time="+num2str(T,'%.2f'))
end
poly_perc=cell2mat(poly_perc);
function out=blockranges(z,k)
out=repelem( 1:k:z ,2);
out=reshape(out(2:end-1),2,[]);
end
function Pgons = tiles2polyshape(Lon,Lat)
[m,n]=size(Lat);
i=1:m-1;
j=1:n-1;
fcn=@(varargin) reshape(cat(3,varargin{:}) ,[],4)';
c1=fcn( Lon(i,j), Lon(i+1,j), Lon(i+1,j+1), Lon(i,j+1) );
c2=fcn( Lat(i,j), Lat(i+1,j), Lat(i+1,j+1), Lat(i,j+1) );
fcn=@(c)reshape(c,4,1,[]);
C=num2cell([fcn(c1), fcn(c2)] , [1,2]);
Pgons=cellfun(@(varargin)polyshape(varargin{:},'Simplify',0,'Keep',1) , reshape(C,m-1,n-1) );
end

6 个评论

Thank you very much for such an amazing work!
I don't have Parallel Computing Toolbox, but if it finishes before New Year, I will be more than happy! In my initial code every iteration took approx. 20 sec, with over 10 mln iterations, so I was pretty serious that it would take years to finish.
The code is now running. My computer isn't as fast as your, but I'll let you know when it's done.
Yes, finally, after almost 4 days (btw, how powerful is your machine? You wrote that every iteration takes 2.5 sec, while at my computer it needed 20 sec. I thought my machine to be pretty good...), the script is done, and it worked perfectly!
The size of the output 'poly_perc' is 2875 x 3600 (division 2881:25, remainder 6), so it misses last 6 rows. Fortunately, in those rows there is only a tiny tip of Poland (if you look carefully, you'll see it in the picture below). I will manage to calculate the 'poly_perc' there on my own.
Here is also the zoomed-in picture:
I also wonder, how was this possible: in my initial code Matlab needed 20 sec for every intersection, while in your code, 20 sec was needed for 25 x 25 intersections. So it worked 625 times quicker, and still kept the information for every grid cell. Wow.
Thank you @Matt J, this was an excellent piece of code!
This should correct the missing rows:
function out=blockranges(z,k)
out=repelem( unique([1:k:z,z]) ,2); %<---fixed this line
out=reshape(out(2:end-1),2,[]);
end
I also wonder, how was this possible: in my initial code Matlab needed 20 sec for every intersection, while in your code, 20 sec was needed for 25 x 25 intersections.
Because in your code, you compute intersect(pgon, polyout) whereas in mine, we compute intersect(pgon, Pforest). Pforest is a much smaller polyshape, the section of polyout covering only the current 25x25 block.
(btw, how powerful is your machine? You wrote that every iteration takes 2.5 sec, while at my computer it needed 20 sec. I thought my machine to be pretty good...)
My 2.5 sec. timing was based on tests with the data files you attached to your post, which contain a much smaller polyout. I don't know how long my computer would take with your larger data set. If you need to repeat this calculation, you might try partitioning your polyout and lat/lon data into smaller chunks, similar in size to your posted attachments. That would probably speed up the time per iteration.
My 2.5 sec. timing was based on tests with the data files you attached to your post, which contain a much smaller polyout.
That is a very good remark. I tried with the data I attached to the post, and indeed, the iteration takes ~1 sec. I will remember this, when I'll need to repeat this calculation (and I'll have to do it for coniferous and broadleaves forests separately).
Thanks also for the patch to the code!

请先登录,再进行评论。

更多回答(1 个)

Here's a more vectorized version. I could not test it, of course, since input data was not provided to us.
[m,n]=size(lat);
i=1:m-1;
j=1:n-1;
fcn=@(varargin) reshape(cat(3,varargin{:}) ,[],4)';
c1=fcn( lon(i,j), lon(i+1,j), lon(i+1,j+1), lon(i,j+1) );
c2=fcn( lat(i,j), lat(i+1,j), lat(i+1,j+1), lat(i,j+1) );
fcn=@(c)reshape(c,4,1,[]);
C=num2cell([fcn(c1), fcn(c2)] , [1,2]);
Pgons=cellfun(@polyshape , reshape(C,m-1,n-1) );
ar1=area(Pgons);
ar2=area(intersect(Pgons,polyout));
poly_perc = ar2./ar1*100;

4 个评论

If you have the Parallel Computing Toolbox, this would probably be a good candidate for parfor,
parfor i=1:n-1
p=Pgons(:,i);
ar1(:,i)=area(p);
ar2(:,i)=area(intersect(p,polyout));
end
Thank you vary much for answering!
I attached a piece of input data. Could you try if it is working?
I am not very familiar with those '@' functions, but in the above code I don't see any loop. Is it on purpose? Or should I add a loop (if yes, should there be two loops - one for lats and one for lons)?
See below for some comments added to the code of @Matt J, i've added a minor step to evaluate the overlap. This will speed up the computation since it avoids evaluation the intersection area if there is no overlap.
load('lat_lon.mat')
load('polyout.mat')
[m,n] = size(lat);
i = 1:m-1; % row index for the grid cell
j = 1:n-1; % column index for the grid cell
% set up a cell array (called C), with as many rows and columns as the
% number of grid cells
% get the vertices for the grid cells
fcn = @(varargin) reshape(cat(3,varargin{:}) ,[],4)';
c1 = fcn( lon(i,j), lon(i+1,j), lon(i+1,j+1), lon(i,j+1) );
c2 = fcn( lat(i,j), lat(i+1,j), lat(i+1,j+1), lat(i,j+1) );
% store in the cell array, and reshape to the dimensions of the grid cells
fcn = @(c) reshape(c,4,1,[]);
C = num2cell([fcn(c1), fcn(c2)] , [1,2]);
C = reshape(C,m-1,n-1);
% take a small random set of C for the online evaluation
% don't do this in the final code
C = C( randi(numel(C(:)),10,1) );
% convert each grid cell into a polyshape
Pgons = cellfun(@polyshape , C );
% set up logicals which we can use to avoid computation if the polygons do not overlap
TF = overlaps(Pgons,polyout)
TF = 10×1 logical array
0 0 0 0 0 1 0 0 0 0
% initialize the coverage percentage
poly_perc = zeros(size(TF));
if any(TF(:))
% evaluate the area of the grid cells
ar1 = area(Pgons(TF));
% compute the intersection area
ar2 = area(intersect(Pgons(TF),polyout));
% get the percentage
poly_perc(TF) = ar2./ar1*100
end
poly_perc = 10×1
0 0 0 0 0 70.0130 0 0 0 0
I am not very familiar with those '@' functions

请先登录,再进行评论。

产品

版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by