Find not fast enough - is there a speedier solution for large matrices?

1 次查看(过去 30 天)
My code works but given the size of my data matrices is too slow despite access to a pretty heavy duty machine. I'm sure the matlab community has a good and quick fix to my woes. It takes about 0.3s per iteration at the moment so we are talking days/weeks of computing to run my code. I think my main problem lies with use of the function 'find', and I need a more elegant solution perhaps vectorizing or using the parallel computing tool box (available but new to me).
Thanks in advance!
The problem:
I have 36 sampling dates. A large matrix (1942242*2) of xy sample coordinates('locmat'), my code pasted below then reads in a three column matrix for each sample date in turn. These matrices have similar but different lengths to 'locmat' that consists of xy coordinate data (read in to 'xydat') and a data measurement at that xy location (read in to 'fetcol'). All coordinatyes in 'xydat' have an exact match in 'locmat', but are indexed differently depending on the sample date. Therefore not all xy coordinates in 'locmat' are to be found in 'xydat'. I am trying to index the data in the sample files to locmat based on the xy locations - producing a single matrix (1942242*36) called 'fetmat'. Any coordinate with no data on a given date is stored as -999.
Code:
nosamp = 36;
fetpath = 'C:\Data\dat_text\';
locfnam = 'C:\Data\srchmat\locmat.csv';
locmat = csvread(locfnam);
fn = dir(fetpath);
ns = {fn.name};
ns = sort(ns);
ns = char(ns(3:end));
fetmat = zeros(length(locmat),nosamp);
for q = 1:size(ns,1);
fnam = ns(q,:);
filename = fullfile(fetpath, fnam);
fetdat = csvread(filename, 1,2);
xydat = fetdat(:,2:3);
fetcol = fetdat(:,1);
clear fetdat;
for s = 1:length(locmat);
xysrch = locmat(s,:);
xyrep = repmat(xysrch,length(locmat),1);
ids = find(locmat == xyrep) ;
if isempty(ids)
fetmat(s,q) = -999;
else
fetmat(s,q) = fetcol(ids(1));
end
end
end
  1 个评论
Roger Stafford
Roger Stafford 2012-12-20
I don't entirely understand your code. In spite of the statement about 'xydat' having a match in 'locmat' there is no reference to 'xydat' within your for-loops. Instead you seem to be searching for duplications in 'locmat' itself. Perhaps I haven't understood your description correctly.
However I can make a general comment concerning the use of the 'find' function. When you have a long list to be repeatedly searched for specific items it is best not to use 'find' if you can possibly avoid it. If you use a sorted list instead, there are some much faster methods of finding a match. With your 'locmat' at a length of 1,942,242 rows such a search can take only log2(1,942,242) = 21 comparisons rather than 1,942,242 of them using a binary search algorithm. I am fairly sure the matlab function 'ismember' uses just such a method in finding elements of one set which lie in another set. Of course you are apparently trying to match a pair of values, x and y, but I am sure there is a way of making use of the binary search technique which would apply here.
You don't want to be scanning 'locmat' from one end to the other repeatedly 1,942,242 x 36 times. That's over 100 trillion comparisons!
Roger Stafford

请先登录,再进行评论。

采纳的回答

Matt J
Matt J 2012-12-20
编辑:Matt J 2012-12-20
for q = 1:size(ns,1);
fnam = ns(q,:);
filename = fullfile(fetpath, fnam);
fetdat = csvread(filename, 1,2);
xydat = fetdat(:,2:3);
fetcol = fetdat(:,1);
clear fetdat;
[~,fetmat(:,q)]=ismember(locmat,xydat,'rows');
end
fetmat(~fetmat)=-999;

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Structures 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by