How can I match similar events between 2 matrices?

2 次查看(过去 30 天)
So I have two matrices each with 5 columns of data. Both contain latitude, longitude, depth, time, and magnitude values. Matrix A has around 30,000 events or rows (each event is represented by lat,lon,dep,time, and mag)and matrix B has around 50,000 events. Both datasets represent the same sequence of earthquake data, but matrix B was created with less stringent error parameters and thus more events (earthquakes) were located and included in that matrix. So the 30,000 events in matrix A are also in matrix B along with ~20,000 others.
I need to match the earthquakes from each catalog. That is, an earthquake will have a unique lat, lon, depth, and time. I need to find the events in each catalog that have the same location and time and call those the same event. Now of course earthquakes can happen simultaneously so matching times alone won't cut it. I will need to match times (with some small amount of error) and locations to confidently say the events are the same.
Before I delve much deeper...Any suggestions on how to implement this? I have some working code that is slow, so I'm looking to optimize my solution.
I basically need to calculate distances between each lat,lon,depth in matrix A and each lat,lon,depth in matrix B. An event in one catalog should basically have the same location in the other catalog. There may be some small discrepancies but anything within a few meters is likely the same event. Right now, I'm using a nearest neighbor search to find distances between all the locations in one matrix from the other.
  3 个评论
per isakson
per isakson 2016-12-14
编辑:per isakson 2016-12-14
How important is speed?
There is an old trick (by John D'Errico, I think):
  • Create new matrices of whole numbers by converting one column at a time round(A(:,jj)/tol). This allows for different tolerance values for different columns.
  • Search matches with intersect(...,'rows'), or ismember(?)
psprinks
psprinks 2016-12-15
编辑:psprinks 2016-12-15
So I'm not having much luck using ismembertol. Well....I don't trust the results yet. I think it's because I'm not using to tool correctly. Or I need another tool.
Here is some code and the data file.
P_set and D_set are the matrices with the data columns(lat,lon,depth,time,magnitude). The times are in Matdays. P_set (29399 events) is smaller than D_set (40848 events), but both represent the same earthquakes. There are just extras in D_set. I need to find the events that are common to both matrices. I need to do this by matching locations and times. I have some tolerance values set for time and location, but they could be wrong.
ttol=0.000001; loctol=0.000308;%set time and location tolerances
[isinB, rowinB]=ismembertol(P_set(:,4),D_set(:,4),ttol); %search by times
temp=find(isinB ==1);
[isinB1, rowinB1]=ismembertol(P_set(:,1:3),D_set(:,1:3),loctol,'ByRows',true);%search by locations
temp1=find(isinB1 ==1);

请先登录,再进行评论。

采纳的回答

Guillaume
Guillaume 2016-12-14
If I understood correctly, all you need is one line of code using ismembertol:
[isinB, rowinB] = ismembertol(A, B, 'ByRows', true)
You can specify a tolerance and a 'DataScale' vector to vary the amplitude of the tolerance for each column.
  5 个评论
Guillaume
Guillaume 2016-12-14
There is absolutely no reason for the inputs to ismembertol (and ismember) to be the same length. It simply tells you which rows (with the 'rows' / |'ByRows' option) of the first input are found somewhere in the second input.
The link to the documentation of ismembertol is in my answer. As it says at the end, Introduced in R2015a.
You need the tol version since you don't want exact comparison. Time to upgrade? Replicating the full behaviour of ismembertol particularly with the 'ByRows' option is not going to be trivial.
Here's an attempt that loses the automatic tolerance, magnitude scaling and other niceties:
function [isfound, where] = ismembertolbyrow(A, B, tol)
%A, B: two matrices with the same number of columns
%tol: a vector with the same number of columns as A and B
%tol is absolute. u and v are within range if abs(u-v) < tol
validateattributes(A, {'numeric'}, {'2d'});
validateattributes(B, {'numeric'}, {'2d', 'ncols', size(A, 2)});
validateattributes(tol, {'numeric'}, {'positive', 'row', 'numel', size(A, 2)});
intol = squeeze(all(abs(bsxfun(@minus, A, permute(B, [3 2 1]))) <= tol, 2));
isfound = any(intol, 2);
where = zeros(size(isfound));
[r, c] = find(intol);
where(r) = c;
end
psprinks
psprinks 2016-12-14
Ok, thank you Guillaume. I am requesting the upgrade to 2016b from my university now. Waiting on the download link.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Cell Arrays 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by