Find equally (almost equal) distanced elements in an array

9 次查看(过去 30 天)
Assume we have an array Y1 as follows:
Y1 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.278441922763728,0,0,0,0,0,0,0,0,0,0,0,0.339039014549722,0,0,0,0,0,-0.983321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
If you plot it, you can see that elements located at 127, 152, 178, 203, 229 are almost equally located in the array, with an almost equal distance of 25 between these elements. Given signals similar to Y1, how can we find such elements? Please note that the Y1 here is the simplest possible case, where only one set of equally distributed elements are given. In reality, there might be two or three sets of elements with different spacing and different locations, For example:
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
If you plot Y2, you can see that elements located at 20, 25, 30, 35, 40 are equally distributed with the distance of 5 in between these elements (Set 1), and elements located at 127, 152, 178, 203, 229 are almost equally located with an almost equal distance of 25 between these elements (Set 2).
To sum up, given Y1, it is desired to obtain [127, 152, 178, 203, 229], and given Y2, it is expected to obtain {[20, 25, 30, 35, 40] and [127, 152, 178, 203, 229]}.
Y1 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.278441922763728,0,0,0,0,0,0,0,0,0,0,0,0.339039014549722,0,0,0,0,0,-0.983321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
figure; plot(Y1); axis tight; grid;
ind1 = find(Y1)
20 32 38 45 48 51 76 127 152 178 203 229
diff(ind1)
12 6 7 3 3 25 51 25 26 25 26
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
figure; plot(Y2); axis tight; grid;
ind2 = find(Y2)
20 25 30 35 40 47 50 54 79 130 155 181 206 232
diff(ind2)
5 5 5 5 7 3 4 25 51 25 26 25 26

回答(2 个)

John D'Errico
John D'Errico 2022-8-16
编辑:John D'Errico 2022-8-16
You don't really say enough about your question to make it easy to solve. How long will these vectors be? How many non-zeros? Do you know, in advance the strides that you want to search for? Or are you just looking for ANY (approximately) common strides? The latter is of course fairly difficult to do with any degree of intelligence. But you might be able to make something work using clustering in 1-dimension. For example...
Y1 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.278441922763728,0,0,0,0,0,0,0,0,0,0,0,0.339039014549722,0,0,0,0,0,-0.983321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
What matters are not what the elements are but WHERE they lie. So start by extracting them in a virtually sparse form.
ind1 = find(Y1)
ind1 = 1×12
20 32 38 45 48 51 76 127 152 178 203 229
As you can see, this is a pretty small set. But now, what matters is how far apart are they? So compute the interpoint distances. A simple way to do this is to use pdist.
D1 = squareform(pdist(ind1'))
D1 = 12×12
0 12 18 25 28 31 56 107 132 158 183 209 12 0 6 13 16 19 44 95 120 146 171 197 18 6 0 7 10 13 38 89 114 140 165 191 25 13 7 0 3 6 31 82 107 133 158 184 28 16 10 3 0 3 28 79 104 130 155 181 31 19 13 6 3 0 25 76 101 127 152 178 56 44 38 31 28 25 0 51 76 102 127 153 107 95 89 82 79 76 51 0 25 51 76 102 132 120 114 107 104 101 76 25 0 26 51 77 158 146 140 133 130 127 102 51 26 0 25 51
That allows us to visualize the distance between any pair of non-zero elements. It also helps us to think about how we might look for clusters of points that lie at a common distance. Such clusters will lie near the main diagonal of the matrix D1, but not always perfectly so. Hmm. Anyway, suppose we decided to look for a set of equally distanced points, at roughly a distance of 25 as a stride? First, can we easily do that?
% find points that are at a stride of 25 +/- 3, converting the distance
% matrix into a graph.
G1 = graph(abs(D1 - 25) <= 3)
G1 =
graph with properties: Edges: [8×1 table] Nodes: [12×0 table]
Now, can we find a set of nodes that are connected?
conncomp(G1)
ans = 1×12
1 2 3 1 1 1 1 4 4 4 4 4
This gives us a set of nodes that were connected in the graph. Do you see that the last 4 nodes were all connected, thus at a stride of 25+/- 3?
G1.Edges
ans = 8×1 table
EndNodes ________ 1 4 1 5 5 7 6 7 8 9 9 10 10 11 11 12
If you can see how to read that table, you would see that nodes 8, 9, 10, 11, and 12 all lie in that common stride.
Similarly, look at Y2.
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
ind2 = find(Y2);
D2 = squareform(pdist(ind2'))
D2 = 14×14
0 5 10 15 20 27 30 33 58 109 134 160 185 211 5 0 5 10 15 22 25 28 53 104 129 155 180 206 10 5 0 5 10 17 20 23 48 99 124 150 175 201 15 10 5 0 5 12 15 18 43 94 119 145 170 196 20 15 10 5 0 7 10 13 38 89 114 140 165 191 27 22 17 12 7 0 3 6 31 82 107 133 158 184 30 25 20 15 10 3 0 3 28 79 104 130 155 181 33 28 23 18 13 6 3 0 25 76 101 127 152 178 58 53 48 43 38 31 28 25 0 51 76 102 127 153 109 104 99 94 89 82 79 76 51 0 25 51 76 102
Again, we might use some tools to decide to look for the common cluster at a stride of 5.
G2 = graph(abs(D2 - 5) <= 1)
G2 =
graph with properties: Edges: [5×1 table] Nodes: [14×0 table]
conncomp(G2)
ans = 1×14
1 1 1 1 1 2 3 2 4 5 6 7 8 9
As you can see there, we have a cluster of the first 5 nodes, all with a stride of roughly 5.
G2 = graph(abs(D2 - 25) <= 1)
G2 =
graph with properties: Edges: [6×1 table] Nodes: [14×0 table]
conncomp(G2)
ans = 1×14
1 2 3 4 5 6 2 7 7 8 8 8 8 8
And again, a connected cluster of the ast 5 nodes, each roughly at a distance of 25.
With some effort, you might even be able to look for the common stride intelligently. Perhaps a way might exist to use clustering to search for those (NEARLY) common strides.
At the same time, be careful, since if your vector is terribly long, then things will go to hell, as that interpoint distance matrix will get larger. So don't expect this to work nearly as well if you have many thousands of nodes. Since there are now several questions left up in the air and unresolved, I'll stop here. Regardless, the graph theoretic tools In MATLAB will prove to be a terribly useful way to approach the problem.
  2 个评论
Sardius
Sardius 2022-8-16
编辑:Sardius 2022-8-16
Dear John,
First, I would like to deeply appreciate for the time you spent and many thanks for the explanations. Regarding your questions, I have the answers as follows:
How long will these vectors be? These are ultrasonic signals of various lengths. But they are not longer than 1024 and not shorter than 256.
How many non-zeros? We don't know. Depends on each signal.
Do you know, in advance the strides that you want to search for? Definitely No, that is the key we are looking for (to be allocated as time of flight, TOF, of the signals).
I will go through your method tomorrow and then will get back to you. Meanwhile, since the code might need to be embedded on a microcontroller, the method needs to work pretty fast as well.
Thanks very much again.
John D'Errico
John D'Errico 2022-8-16
Since the vectors will not be too long, the matrices will not grow immensely large. The sparser they are, the better of course. Would the above ideas be workable on a microcontroller? God only knows. Certainly not me. :) It would be worth testing on a larger problem.
But really, it looks like your main problem is one of automatically searching for a common stride. Once you have identified the stride, or at least a good candidate, then you have a graph-theoretic solution available.
One simple option would be to use brute force. That is, perform the above analysis, looking for groups of elements with a common stride of 2 +/- 1. Next, search for groups with a common stride of 3 +/- 1, continue the search, increasing the stride, until you find the set of maximal length. Could you do that? Of course. Whether it would be managable, I cannot say. It is very likely that with some thought a better solution exists.

请先登录,再进行评论。


John D'Errico
John D'Errico 2022-8-16
编辑:John D'Errico 2022-8-16
As a subtly different answer, now that I understand your question a little better, consider this idea. My goal in this answer will be to see if an approximately common stride can be identified. The problem is, it need not be EXACT, so you would want to have a tolerance.
Y2 = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-0.2784,0,0,0,0,0.5,0,0,0,0,0.35,0,0,0,0,-0.339039014549722,0,0,0,0,0.83321222184805,0,0,0,0,0,0,1.81696995276160,0,0,1.10627709436174,0,0,-0.754131732193239,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.272403139239999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.254972819695545,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.250627796693075,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.234158547982346,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.212588567489227,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.191203969721771,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
ind2 = find(Y2);
Now, assume we don't know what stride to search for. But it must happen between nodes that are close to each other.
conv(ind2,[1 -1],'valid')
ans = 1×13
5 5 5 5 7 3 3 25 51 25 26 25 26
conv(ind2,[1 0 -1],'valid')
ans = 1×12
10 10 10 12 10 6 28 76 76 51 51 51
conv(ind2,[1 0 0 -1],'valid')
ans = 1×11
15 15 17 15 13 31 79 101 102 76 77
conv(ind2,[1 0 0 0 -1],'valid')
ans = 1×10
20 22 20 18 38 82 104 127 127 102
conv(ind2,[1 0 0 0 0-1],'valid')
ans = 1×10
20 22 20 18 38 82 104 127 127 102
Now, I might scan through those differences, looking for some common, or nearly common strides. If I did do that, I would notice that a stride of something like 5-7 might be of interest. Then a common stride of 10 seems to pop out, as do 15 and 25.
Next, do the connected component analysis for each candidate, as I showed in my other answer.
G2 = graph(squareform(abs(pdist(ind2')' - 6) <= 1));
connections = conncomp(G2)
connections = 1×14
1 1 1 1 1 1 2 1 3 4 5 6 7 8
ind2(connections == 1) % 1 the number that appears the most frequently in connections
ans = 1×7
20 25 30 35 40 47 53
So we have found one sub-sequence of length 7.
I could now repeat the above analysis, looking for subsequences of length 10.
G2 = graph(squareform(abs(pdist(ind2')' - 10) <= 1));
connections = conncomp(G2)
connections = 1×14
1 2 1 2 1 3 1 4 5 6 7 8 9 10
ind2(connections == 1) % again, 1 indicates the longest subsequence
ans = 1×4
20 30 40 50
Next, consider subsequences of length approximately 25...
G2 = graph(squareform(abs(pdist(ind2')' - 25) <= 1));
connections = conncomp(G2)
connections = 1×14
1 2 3 4 5 6 2 7 7 8 8 8 8 8
ind2(connections == 8)
ans = 1×5
129 154 180 205 231
All of this was just a little playing around. You could automate something that works, but first, you need to find something that works.
  1 个评论
Sardius
Sardius 2022-8-19
Dear John,
Thank you very much for your answer. I read it carefully. I am trying to modify the following idea. Currently it is capable of detecting the longest interval atleast. The remaining issue is now in case of exact same length (width), it will show only the first one.
ind1 = find(Y2);
D = diff(ind1);
if ~isempty(D)
D = [D D(end)];
x = [];
for i = 1:length(D)-1
Kind = find(abs(D(i)-D(i+1))<=1);
if isempty(Kind)
Kind = 0;
x(1,i) = Kind;
end
x(1,i) = Kind;
end
zpos = find(~[0 x 0]);
[~, grpidx] = max(diff(zpos));
key = zpos(grpidx);
TOF = abs(ind1(key)-ind1(key+1));
end
I will be working on it and will let you know in case of any better solution and/or progress towards the current approach. Let me know your comments please. Thanks very much again.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Linear Algebra 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by