Compare two strings with some restrictions
1 次查看(过去 30 天)
显示 更早的评论
flashpode
2021-9-15
Hey, how are you?
I have to compare to strings of n and m lines each other to see if they have the same messages. The messages are the following way:
!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053
!AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B0053
!AIVDM,1,1,,B,15AMJH00000:1i8Ga0v@0Akb00Sw,0*390053
!AIVDM,1,1,,B,13EcsW7P00P:07JGc9Wh0?wb2<0m,0*0E0053
!AIVDM,1,1,,A,13Efqs800109q6fGb0tHhq?d2L1<,0*3C0054
!AIVDM,1,1,,A,137FrD02Bu0:=9@GS16Vu5O`00S<,0*5D0054
!AIVDM,1,1,,B,33F6AD00@0P9ud6GbtWAQmob22rA,0*2B0055
As you can see the last four numbers change from 0000 to 5959 the first two are minutes and the other two seconds. I have the code to compare all the messages from one script to another but now I have to compare just the messages that have and ending in a range that we put. Exemple:
!AIVDM,1,1,,A,13GQ>C@P00P9rHrGasGf4?wn20SK,0*020059
This message ends with 0059 I should compare it with all the messages that end from the number 0000 and 0159. That makes a comparison with the numbers that are one minut above and up the message.
4 个评论
flashpode
2021-9-15
the output is another string that contains the messages that are the same in both strings
My problem is put the restriction in the comparison. I do not know how to.
flashpode
2021-9-15
编辑:flashpode
2021-9-18
Okay one string is this one:
!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053
!AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B0053
!AIVDM,1,1,,B,15AMJH00000:1i8Ga0v@0Akb00Sw,0*390053
!AIVDM,1,1,,B,13EcsW7P00P:07JGc9Wh0?wb2<0m,0*0E0053
!AIVDM,1,1,,A,13Efqs800109q6fGb0tHhq?d2L1<,0*3C0054
and the other string is:
"!AIVDM,1,1,,A,13ErMfPP00P9rFpGasc>4?wn2802,0*070000"
"!AIVDM,1,1,,B,13FMMd0P0009o1jGapD=5gwl06p0,0*780000"
"!AIVDM,1,1,,A,4028ioivDfFss09kDvGag6G0080D,0*790000"
"!AIVDM,1,1,,A,D028ioj<Tffp,0*2C0000"
so the output is another string that has the messages taht are in both strings
采纳的回答
Walter Roberson
2021-9-15
In https://www.mathworks.com/matlabcentral/answers/1452949-get-the-last-for-digits-as-the-time-this-message-was-sent#answer_787044 I showed you have to extract the last 4 digits of each line, as text.
The result would have been a cell array of character vectors. You can str2double() to get a set of decimal numbers.
Once you have the set of decimal numbers, referred to below as DN, then
dur = minutes(floor(DN/100)) + seconds(mod(DN,100));
If you do that for both sets of data, getting dur1 and dur2, then
[~, M1, S1] = hms(dur1);
[~, M2, S2] = hms(dur2);
[has_match0, idx0] = ismember(M1, M2);
[has_match1, idx1] = ismember(M1+1, M2);
M1_has_match = has_match0 | has_match1;
M1_match(has_match1) = idx1(has_match1);
M1_match(has_match0) = idx0(has_match0);
M1_matches = find(M1_has_match);
M2_matches = M1_match(M1_has_match);
If I got everything right, then M1_matches will be the index into the first set of durations in which there are matches, and M2_matches will be the corresponding indexes into the second set of durations that match the first set.
Any one entry in the first set of durations is only looked for once in the second set of durations, but because of the matching process, any given entry in the second set of durations could match more than one entry in the first set of durations. You did not ask for the closest match that occurs within a particular time interval: you asked for matches that occur if the second set has any entry that has the same minute as one in the first set, or is the next minute after one in the first set.
31 个评论
flashpode
2021-9-15
Well but dur contains the whole message or just the last four numbers? Because I had to compare the messages that are the same but has different numbers at the end. I do not understand very much what didi you do
flashpode
2021-9-15
I just want to compare this part of the message
"!AIVDM,1,1,,A,13FtuD?P00P9tuDGbFw4JgvH00T<,0*3F0011" with the other messages that have an ending number in the rank that we put that is from 1 minute less or more. I do not understand what you did and why because in the end you are not comparing the messages.
Walter Roberson
2021-9-16
S1s = [
"!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053"
"!AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B0053"
"!AIVDM,1,1,,B,15AMJH00000:1i8Ga0v@0Akb00Sw,0*390053"
"!AIVDM,1,1,,B,13EcsW7P00P:07JGc9Wh0?wb2<0m,0*0E0053"
"!AIVDM,1,1,,A,13Efqs800109q6fGb0tHhq?d2L1<,0*3C0054"
"!AIVDM,1,1,,A,137FrD02Bu0:=9@GS16Vu5O`00S<,0*5D0054"
"!AIVDM,1,1,,B,33F6AD00@0P9ud6GbtWAQmob22rA,0*2B0055"
"!AIVDM,1,1,,B,13F9RTPP00P9rL0GasQf4?wf2D1E,0*430055"
"!AIVDM,1,1,,A,4028j;1vDfG0o09cG0Gdh4i000S:,0*560056"
"!AIVDM,1,1,,A,D028j;0flffp,0*430056"
"!AIVDM,1,1,,B,13E`977P00P:06:Gc8DkW?wh2@Q3,0*690056"
"!AIVDM,1,1,,A,13EpM3PP0009nVdGb9Itfwwh0HQ6,0*0A0056"
"!AIVDM,1,1,,B,13GQ:Fw01@P9qH6GaS:WiVEh2<1E,0*6D0056"
"!AIVDM,1,1,,B,13cq;9000IP9saTGb3d0b0ad8<1s,0*320057"
"!AIVDM,1,1,,A,39NSCRU000P9uM`GbVRU=@qD0000,0*480057"
"!AIVDM,1,1,,A,13Esmv000009qWBGb=BLHAUl0L1U,0*5A0057"
"!AIVDM,1,1,,A,13F:b60P0J09sKpGb1UhGOwl0<1l,0*3C0058"
"!AIVDM,1,1,,A,13EaMT?000P9wK2Gblptiooh0D1;,0*430059"
"!AIVDM,1,1,,A,13GNje0P00P9nebGb5nv4?wl00SB,0*4B0059"
"!AIVDM,1,1,,A,13GQ>C@P00P9rHrGasGf4?wn20SK,0*020059"
"!AIVDM,1,1,,B,H3dPfW4UC=D7@?>q82knjo2P9430,0*610059"
"!AIVDM,1,1,,B,13GPhM0P00P9rGPGast>4?wn2<1C,0*080059"
"!AIVDM,1,1,,A,13ErMfPP01P9rG0Gasc>4?wn26p4,0*0F0100"
"!AIVDM,1,1,,A,39NSVP500009tArGbL07q6Mn0F;r,0*490100"
"!AIVDM,1,1,,A,4028ioivDfG0s09kDvGag6G006p0,0*010100"
"!AIVDM,1,1,,A,D028ioj"
"!AIVDM,1,1,,A,4028jJ1vDfG1009cHtGdh1g026p4,0*100100"
"!AIVDM,1,1,,B,137FrD0v2u0:=9TGRwMVu5N00H0j,0*5E0101"
"!AIVDM,1,1,,B,19NS@=@01qP9u@fGQs3PbP`40H0l,0*030101"
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbG1TJgv40H1;,0*7D0102"
];
S2s = [
"!AIVDM,1,1,,A,13ErMfPP00P9rFpGasc>4?wn2802,0*070000"
"!AIVDM,1,1,,B,13FMMd0P0009o1jGapD=5gwl06p0,0*780000"
"!AIVDM,1,1,,A,4028ioivDfFss09kDvGag6G0080D,0*790000"
"!AIVDM,1,1,,A,D028ioj<Tffp,0*2C0000"
"!AIVDM,1,1,,B,19NS@=@01qP9tp4GQkJ0bh`200SP,0*780000"
"!AIVDM,1,1,,B,137FrD0v2u0:=4pGS;s6u5On00SJ,0*000000"
"!AIVDM,1,1,,A,4028jJ1vDfG0009cIVGdh2?0280S,0*400000"
"!AIVDM,1,1,,B,H3GQ9khl4LLTF0l5T0000000000,2*070001"
"!AIVDM,1,1,,A,H33mw2Q>uV0luHTpN3800000000,2*080001"
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbFw4Jgv40L1f,0*030002"
"!AIVDM,1,1,,B,D028jJ03`N?b<`O6Dl<O6D0,2*350002"
"!AIVDM,1,1,,B,137JlD51h0P9tddGbCQSm0j2081e,0*0E0002"
"!AIVDM,1,1,,A,13EuB00P0009n`TGb82:ugv600RQ,0*110003"
"!AIVDM,1,1,,A,13EsReP00009vQ`Gbj65gPh400SJ,0*350003"
"!AIVDM,1,1,,B,33EtT>5000P9nI2Gb8H5FP<60Dm:,0*580004"
"!AIVDM,1,1,,B,13Efqs800109q6DGb0wHhq@826p0,0*0A0004"
"!AIVDM,1,1,,A,13F9RTPP00P9rKpGasQf4?v820Sf,0*6D0004"
"!AIVDM,1,1,,B,13EmCs70010:0;bGc<Lbh3280<14,0*340004"
"!AIVDM,1,1,,A,13EcsW7P00P:07PGc9Ws@gv82D0l,0*060004"
"!AIVDM,1,1,,B,13EpM3PP0009nVPGb9EJG?v:0<1N,0*590005"
"!AIVDM,1,1,,A,15AMJH00000:1i6Ga0oP0Af:0<16,0*470005"
"!AIVDM,1,1,,A,13cq;9000GP9sUlGb2IPRPb680T5,0*510005"
"!AIVDM,1,1,,B,4028j;1vDfG0509cFhGdh5Q0083a,0*5C0006"
"!AIVDM,1,1,,B,D028j;0flffp,0*400006"
"!AIVDM,1,1,,A,137FrD032u0:=5<GS;6Vu5N:0<1=,0*620006"
"!AIVDM,1,1,,B,13Esmv000009qW:Gb=BLHAT@0H4>,0*660007"
"!AIVDM,1,1,,A,13E`977P00P:06DGc8D00?v>26p0,0*2B0007"
"!AIVDM,1,1,,A,33cm<M1th209wjpGb066k1v<0P00,0*470007"
"!AIVDM,1,1,,A,13GQ:Fw01?P9qW4GaW=7d6:>26p0,0*150007"
"!AIVDM,1,1,,B,13F:b60P0I09sHjGb0D0Bwv@0<1i,0*780008"
"!AIVDM,1,1,,B,B3E`?Q00002Q8LUrpW7Q3wT5kP06,0*760008"
"!AIVDM,1,1,,B,H3`fKwPiDp40000000000000000,2*5F0008"
"!AIVDM,1,1,,B,H3`fKwTTDBE5847@4lpnl0200320,0*700008"
"!AIVDM,1,1,,B,33EaMT?000P9wK4Gblq<iop<00jk,0*390009"
"!AIVDM,1,1,,B,H4hJ<S0l58T4R118Tp<E=>0TTV0,2*450009"
"!AIVDM,1,1,,A,13GPhM0P00P9rGHGast>4?vB20Sr,0*610009"
"!AIVDM,2,1,2,B,53cq;982CRFhT8P<000`thiV1H4p4@Tt00000017D1@CC4en0K1DhPkP0000,0*2C0009"
"!AIVDM,2,2,2,B,00000000000,2*250009"
"!AIVDM,1,1,,B,13GQ>C@P00P9rHjGasGf4?vB2@5d,0*0D0009"
"!AIVDM,1,1,,A,7000003dTINH,0*6D0010"
"!AIVDM,1,1,,A,19NS@=@01qP9tt6GQlahb@`F0H65,0*2D0010"
"!AIVDM,1,1,,A,33FMMd0P0009o1fGapC<Uwv@000k,0*330010"
"!AIVDM,1,1,,B,4028ioivDfG0909kDvGag6G00@6;,0*730010"
"!AIVDM,1,1,,B,D028ioj<Tffp,0*2F0010"
"!AIVDM,2,1,3,A,53FMMd82;AMHD4i4001=049DpdE:3C400000001I081234He0=hUDTR@CPK0,0*030010"
"!AIVDM,2,2,3,A,hDm1C33kP00,2*1F0010"
"!AIVDM,1,1,,B,13ErMfPP00P9rFrGasc>4?vD20RI,0*3C0010"
"!AIVDM,1,1,,B,4028jJ1vDfG0:09cIRGdh1g0286J,0*090010"
"!AIVDM,1,1,,B,13GNje0P00P9nePGb5mN4?vD00S=,0*170011"
"!AIVDM,1,1,,B,13EoPo7P00P:0IjGc:d00?vF2@6T,0*490011"
"!AIVDM,1,1,,A,13FtuD?P00P9tuDGbFw4JgvH00T<,0*3F0011"
"!AIVDM,1,1,,B,H3GQ9klTC=D4s;H51npno0106400,0*040011"
"!AIVDM,1,1,,B,137FrD0uBu0:=68GS9P6u5NF0@7B,0*2D0012"
"!AIVDM,1,1,,A,H33mw2TTCBD6ubp00000001@4440,0*170012"
"!AIVDM,1,1,,B,H4hJ<S4T1=30000J7;FoPP1P<560,0*4A0013"
"!AIVDM,1,1,,A,B3EpfoP0002OoB5rlUwQ3wT5kP06,0*1B0013"
"!AIVDM,1,1,,B,15AMJH00000:1i6Ga0pP0AhL0D16,0*5B0014"
"!AIVDM,1,1,,B,13F9RTPP00P9rKrGasQf4?vL288J,0*570014"
];
msg1 = regexp(S1s, '.*(?=\d{4}$)', 'match', 'once');
msg2 = regexp(S2s, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(S1s, '\d{4}$', 'match', 'once');
t2 = regexp(S2s, '\d{4}$', 'match', 'once');
mask1 = ismissing(msg1) | ismissing(t1);
mask2 = ismissing(msg2) | ismissing(t2);
origidx1 = (1:length(msg1));
origidx2 = (1:length(msg2));
msg1(mask1) = []; t1(mask1) = []; origidx1(mask1) = [];
msg2(mask2) = []; t2(mask2) = []; origidx2(mask2) = [];
DN1 = str2double(t1);
DN2 = str2double(t2);
dur1 = minutes(floor(DN1/100)) + seconds(mod(DN1,100));
dur2 = minutes(floor(DN2/100)) + seconds(mod(DN2,100));
[~, Min1, S1] = hms(dur1);
[~, Min2, S2] = hms(dur2);
num_msg1 = length(msg1);
msg_match = cell(num_msg1, 1);
for K = 1 : num_msg1
all_match_idx = find(msg1(K) == msg2);
if isempty(all_match_idx);
fprintf('No text match for line #%d -> "%s"\n', origidx1(K), msg1(K));
continue;
end
fprintf('potential match for line #%d -> "%s", checking times\n', origidx1(K), msg1(K));
disp(K), disp(all_match_idx)
complete_match_idx = all_match_idx(Min1(K) == Min2(all_match_idx) | Min1(K) == Min2(all_match_idx) - 1);
msg_match{K} = complete_match_idx;
if isempty(complete_match_idx)
fprintf('line %#d -> "%s" matched text but not time\n', origidx1(K), msg1(K));
else
fprintf('line %#d -> "%s" matches on time too! Matches are:\n', origidx1(K), msg1(K));
msg2(complete_match_idx)
end
end
No text match for line #1 -> "!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C"
No text match for line #2 -> "!AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B"
No text match for line #3 -> "!AIVDM,1,1,,B,15AMJH00000:1i8Ga0v@0Akb00Sw,0*39"
No text match for line #4 -> "!AIVDM,1,1,,B,13EcsW7P00P:07JGc9Wh0?wb2<0m,0*0E"
No text match for line #5 -> "!AIVDM,1,1,,A,13Efqs800109q6fGb0tHhq?d2L1<,0*3C"
No text match for line #6 -> "!AIVDM,1,1,,A,137FrD02Bu0:=9@GS16Vu5O`00S<,0*5D"
No text match for line #7 -> "!AIVDM,1,1,,B,33F6AD00@0P9ud6GbtWAQmob22rA,0*2B"
No text match for line #8 -> "!AIVDM,1,1,,B,13F9RTPP00P9rL0GasQf4?wf2D1E,0*43"
No text match for line #9 -> "!AIVDM,1,1,,A,4028j;1vDfG0o09cG0Gdh4i000S:,0*56"
No text match for line #10 -> "!AIVDM,1,1,,A,D028j;0flffp,0*43"
No text match for line #11 -> "!AIVDM,1,1,,B,13E`977P00P:06:Gc8DkW?wh2@Q3,0*69"
No text match for line #12 -> "!AIVDM,1,1,,A,13EpM3PP0009nVdGb9Itfwwh0HQ6,0*0A"
No text match for line #13 -> "!AIVDM,1,1,,B,13GQ:Fw01@P9qH6GaS:WiVEh2<1E,0*6D"
No text match for line #14 -> "!AIVDM,1,1,,B,13cq;9000IP9saTGb3d0b0ad8<1s,0*32"
No text match for line #15 -> "!AIVDM,1,1,,A,39NSCRU000P9uM`GbVRU=@qD0000,0*48"
No text match for line #16 -> "!AIVDM,1,1,,A,13Esmv000009qWBGb=BLHAUl0L1U,0*5A"
No text match for line #17 -> "!AIVDM,1,1,,A,13F:b60P0J09sKpGb1UhGOwl0<1l,0*3C"
No text match for line #18 -> "!AIVDM,1,1,,A,13EaMT?000P9wK2Gblptiooh0D1;,0*43"
No text match for line #19 -> "!AIVDM,1,1,,A,13GNje0P00P9nebGb5nv4?wl00SB,0*4B"
No text match for line #20 -> "!AIVDM,1,1,,A,13GQ>C@P00P9rHrGasGf4?wn20SK,0*02"
No text match for line #21 -> "!AIVDM,1,1,,B,H3dPfW4UC=D7@?>q82knjo2P9430,0*61"
No text match for line #22 -> "!AIVDM,1,1,,B,13GPhM0P00P9rGPGast>4?wn2<1C,0*08"
No text match for line #23 -> "!AIVDM,1,1,,A,13ErMfPP01P9rG0Gasc>4?wn26p4,0*0F"
No text match for line #24 -> "!AIVDM,1,1,,A,39NSVP500009tArGbL07q6Mn0F;r,0*49"
No text match for line #25 -> "!AIVDM,1,1,,A,4028ioivDfG0s09kDvGag6G006p0,0*01"
No text match for line #27 -> "!AIVDM,1,1,,A,4028jJ1vDfG1009cHtGdh1g026p4,0*10"
No text match for line #28 -> "!AIVDM,1,1,,B,137FrD0v2u0:=9TGRwMVu5N00H0j,0*5E"
No text match for line #29 -> "!AIVDM,1,1,,B,19NS@=@01qP9u@fGQs3PbP`40H0l,0*03"
No text match for line #30 -> "!AIVDM,1,1,,B,13FtuD?P00P9tuDGbG1TJgv40H1;,0*7D"
found_something_at = find(~cellfun(@isempty, msg_match))
found_something_at =
0×1 empty double column vector
Walter Roberson
2021-9-16
You are not clear as to what a "match" means, so I had to guess that you wanted to see the same !AIVDM text (without time) appearing in both streams, with the second stream being either the same minute as the original or else the next minute compared to the original.
As you can see, with the same data you provided, there are not text matches at all.
But if you meant that just the times had to match that way, without the text having to match, then it is confusing, as you talk as if there is "a" match in the second set of strings, when instead there are numerous matches if you are just considering "within the next calender minute" to be the match criteria -- and furthermore, that most strings in the second set match multiple strings in the first set if you consider only the time that way. The desired output is not clear.
flashpode
2021-9-16
The desired output is the line that is in both sets without the time(last four digits) as you said. There should be more than one match
Walter Roberson
2021-9-16
Are you wanting to compare only on time? If so then there are multiple matches for each input.
If you are wanting to compare based upon the part before the time, together with the time being close enough, then in the data you posted, there is no matches for that.
Walter Roberson
2021-9-16
If you want to compare only on time, then all except the last 8 of the first input matches, and everything in the second set matches each item in the first set (except the last 8)
flashpode
2021-9-16
I want to compare with all the messages sended one minut before and after of the set 1. Look: S1 message 1 has t0(time) so I want to compare it with the messages from S2 with t0 +- 1 minute.
flashpode
2021-9-16
hey how could I change the string t1 and t2 to be able to make t1+100 and not get as an imput a string t1 with 7 numbers?
Walter Roberson
2021-9-17
how could I change the string t1 and t2 to be able to make t1+100
t1p1dur = minutes(str2double(regexp(t1, '^\d{2}')) + 1) + seconds(str2double(regexp(t1, '\d{2}$')));
t1p1dur.Format = 'mm:ss';
t1p1 = string(t1p1dur);
This would construct new strings that were 1 minute later than the old strings.
Or... you could take the existing Min1 in the code, which is the minutes portion of (valid) t1 entries, and add 1 and compare against Min2 .
Or... you could notice that the existing line
complete_match_idx = all_match_idx(Min1(K) == Min2(all_match_idx) | Min1(K) == Min2(all_match_idx) - 1);
already compares Min1 to Min2 - 1, which is the same thing as comparing Min1+1 to Min2 . If you would feel more comfortable you could rewrite the line marginally to
complete_match_idx = all_match_idx(Min1(K) == Min2(all_match_idx) | Min1(K) + 1 == Min2(all_match_idx));
Walter Roberson
2021-9-17
I want to compare with all the messages sended one minut before and after of the set 1
What does it mean to "compare" ??
If you are going strictly by time, then notice that your first string input
"!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053"
is minute 00, and so matching on time would be asking to match strings with minute 00 or 01. Which strings are those?
"!AIVDM,1,1,,A,13ErMfPP00P9rFpGasc>4?wn2802,0*070000"
"!AIVDM,1,1,,B,13FMMd0P0009o1jGapD=5gwl06p0,0*780000"
"!AIVDM,1,1,,A,4028ioivDfFss09kDvGag6G0080D,0*790000"
"!AIVDM,1,1,,A,D028ioj<Tffp,0*2C0000"
"!AIVDM,1,1,,B,19NS@=@01qP9tp4GQkJ0bh`200SP,0*780000"
"!AIVDM,1,1,,B,137FrD0v2u0:=4pGS;s6u5On00SJ,0*000000"
"!AIVDM,1,1,,A,4028jJ1vDfG0009cIVGdh2?0280S,0*400000"
"!AIVDM,1,1,,B,H3GQ9khl4LLTF0l5T0000000000,2*070001"
"!AIVDM,1,1,,A,H33mw2Q>uV0luHTpN3800000000,2*080001"
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbFw4Jgv40L1f,0*030002"
"!AIVDM,1,1,,B,D028jJ03`N?b<`O6Dl<O6D0,2*350002"
"!AIVDM,1,1,,B,137JlD51h0P9tddGbCQSm0j2081e,0*0E0002"
"!AIVDM,1,1,,A,13EuB00P0009n`TGb82:ugv600RQ,0*110003"
"!AIVDM,1,1,,A,13EsReP00009vQ`Gbj65gPh400SJ,0*350003"
"!AIVDM,1,1,,B,33EtT>5000P9nI2Gb8H5FP<60Dm:,0*580004"
"!AIVDM,1,1,,B,13Efqs800109q6DGb0wHhq@826p0,0*0A0004"
"!AIVDM,1,1,,A,13F9RTPP00P9rKpGasQf4?v820Sf,0*6D0004"
"!AIVDM,1,1,,B,13EmCs70010:0;bGc<Lbh3280<14,0*340004"
"!AIVDM,1,1,,A,13EcsW7P00P:07PGc9Ws@gv82D0l,0*060004"
"!AIVDM,1,1,,B,13EpM3PP0009nVPGb9EJG?v:0<1N,0*590005"
"!AIVDM,1,1,,A,15AMJH00000:1i6Ga0oP0Af:0<16,0*470005"
"!AIVDM,1,1,,A,13cq;9000GP9sUlGb2IPRPb680T5,0*510005"
"!AIVDM,1,1,,B,4028j;1vDfG0509cFhGdh5Q0083a,0*5C0006"
"!AIVDM,1,1,,B,D028j;0flffp,0*400006"
"!AIVDM,1,1,,A,137FrD032u0:=5<GS;6Vu5N:0<1=,0*620006"
"!AIVDM,1,1,,B,13Esmv000009qW:Gb=BLHAT@0H4>,0*660007"
"!AIVDM,1,1,,A,13E`977P00P:06DGc8D00?v>26p0,0*2B0007"
"!AIVDM,1,1,,A,33cm<M1th209wjpGb066k1v<0P00,0*470007"
"!AIVDM,1,1,,A,13GQ:Fw01?P9qW4GaW=7d6:>26p0,0*150007"
"!AIVDM,1,1,,B,13F:b60P0I09sHjGb0D0Bwv@0<1i,0*780008"
"!AIVDM,1,1,,B,B3E`?Q00002Q8LUrpW7Q3wT5kP06,0*760008"
"!AIVDM,1,1,,B,H3`fKwPiDp40000000000000000,2*5F0008"
"!AIVDM,1,1,,B,H3`fKwTTDBE5847@4lpnl0200320,0*700008"
"!AIVDM,1,1,,B,33EaMT?000P9wK4Gblq<iop<00jk,0*390009"
"!AIVDM,1,1,,B,H4hJ<S0l58T4R118Tp<E=>0TTV0,2*450009"
"!AIVDM,1,1,,A,13GPhM0P00P9rGHGast>4?vB20Sr,0*610009"
"!AIVDM,2,1,2,B,53cq;982CRFhT8P<000`thiV1H4p4@Tt00000017D1@CC4en0K1DhPkP0000,0*2C0009"
"!AIVDM,2,2,2,B,00000000000,2*250009"
"!AIVDM,1,1,,B,13GQ>C@P00P9rHjGasGf4?vB2@5d,0*0D0009"
"!AIVDM,1,1,,A,7000003dTINH,0*6D0010"
"!AIVDM,1,1,,A,19NS@=@01qP9tt6GQlahb@`F0H65,0*2D0010"
"!AIVDM,1,1,,A,33FMMd0P0009o1fGapC<Uwv@000k,0*330010"
"!AIVDM,1,1,,B,4028ioivDfG0909kDvGag6G00@6;,0*730010"
"!AIVDM,1,1,,B,D028ioj<Tffp,0*2F0010"
"!AIVDM,2,1,3,A,53FMMd82;AMHD4i4001=049DpdE:3C400000001I081234He0=hUDTR@CPK0,0*030010"
"!AIVDM,2,2,3,A,hDm1C33kP00,2*1F0010"
"!AIVDM,1,1,,B,13ErMfPP00P9rFrGasc>4?vD20RI,0*3C0010"
"!AIVDM,1,1,,B,4028jJ1vDfG0:09cIRGdh1g0286J,0*090010"
"!AIVDM,1,1,,B,13GNje0P00P9nePGb5mN4?vD00S=,0*170011"
"!AIVDM,1,1,,B,13EoPo7P00P:0IjGc:d00?vF2@6T,0*490011"
"!AIVDM,1,1,,A,13FtuD?P00P9tuDGbFw4JgvH00T<,0*3F0011"
"!AIVDM,1,1,,B,H3GQ9klTC=D4s;H51npno0106400,0*040011"
"!AIVDM,1,1,,B,137FrD0uBu0:=68GS9P6u5NF0@7B,0*2D0012"
"!AIVDM,1,1,,A,H33mw2TTCBD6ubp00000001@4440,0*170012"
"!AIVDM,1,1,,B,H4hJ<S4T1=30000J7;FoPP1P<560,0*4A0013"
"!AIVDM,1,1,,A,B3EpfoP0002OoB5rlUwQ3wT5kP06,0*1B0013"
"!AIVDM,1,1,,B,15AMJH00000:1i6Ga0pP0AhL0D16,0*5B0014"
"!AIVDM,1,1,,B,13F9RTPP00P9rKrGasQf4?vL288J,0*570014"
EVERY one of those is minute 00, so EVERY one of them would match on time.
What does not match on time? Well, the last 8 of the S1 entries
"!AIVDM,1,1,,A,13ErMfPP01P9rG0Gasc>4?wn26p4,0*0F0100"
"!AIVDM,1,1,,A,39NSVP500009tArGbL07q6Mn0F;r,0*490100"
"!AIVDM,1,1,,A,4028ioivDfG0s09kDvGag6G006p0,0*010100"
"!AIVDM,1,1,,A,D028ioj"
"!AIVDM,1,1,,A,4028jJ1vDfG1009cHtGdh1g026p4,0*100100"
"!AIVDM,1,1,,B,137FrD0v2u0:=9TGRwMVu5N00H0j,0*5E0101"
"!AIVDM,1,1,,B,19NS@=@01qP9u@fGQs3PbP`40H0l,0*030101"
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbG1TJgv40H1;,0*7D0102"
are either no valid time (4th one) or minute 01. You previously asked only to look into the same and next minute, so minute 01 in the first strings should match minute 01 or 02 in the second strings, and none of those exist, so the last 8 would not match under the old rules.
Look: S1 message 1 has t0(time) so I want to compare it with the messages from S2 with t0 +- 1 minute.
And you just modified the rules to also look backwards by 1 minute. So the 0102 in the input would look back up to minute 00 in t2... which would match everything in t2.
Under the rules you just defined, everything in t1 matches everyhing in t2, with the exception of
"!AIVDM,1,1,,A,D028ioj"
But... saying +/- 1 minute might mean that you want the difference to be no more than 60 seconds, which is different than what you had asked for before, which involved only looking at the minute number. Should a t1 entry of 0000 match a t2 entry of 0105 because the minute 00 is +/- 1 to the minute 01 in t2? Or would you want the match to fail because the time difference would be more than 60 seconds?
With the data you have, ever entry in t1 is within +/- 60 seconds of every entry in t2, with the exception of the
"!AIVDM,1,1,,A,D028ioj"
entry which has no time.
So... matching only on time is not going to be useful.
flashpode
2021-9-17
Yeah, if we got the message from S1 that ends with 0000 get all the messages from S2 that end from 0000 to 0100(that is a minute) and then do a ismember of those.
if the message os S1 ends with a 0100 get all the messages from S2 that ends from 0000 to 0200(that is one minut before and after the time from S1 '0100').
I only put some lines as you asked but I got numbers from messages that go from the number 0000 to the number 5959.
I was asking myself if this could be done by just adding a +-100 to those numbers. But I am working on it by now.
flashpode
2021-9-17
hey, I changed your code to give me in the end the message not the line where you can find it. But as I do not have a good level of matlab development do you know if I could do this comparisson less complicated to do it?
I already have t1 and t2 with number using the function double for this comparison it is just able the way you did. it seems a little complicated for me.
Walter Roberson
2021-9-17
What is your desired output:
- for each S1 input message, a list of all S2 messages that are within 1 "calendar minute" (the minute fields differ by at most 1)?
- for each S1 input message, a list of all S2 messages that are within 60 seconds? (0117 matching 0017 to 0217 but 0117 not matching 0243 because that is more than 60 seconds difference) ?
- or two blobs of messages -- a lump in which every S1 message that has some time-matching entry in S2 is put together, and another lump in which every S2 message that has some time-matching entry in S1 is put together, with no attempt to match point out which message which which other message?
If #3, then perhaps it would be easier to think of it as removing from S1 any message that does not match within 1 minute to something in S2, and remove from S2 any message that does not match within 1 minute to something in S1 ? The logic for that can be more efficient.
flashpode
2021-9-17
Here is the code you gave me with some diferences, the lines I've put % are the ones that do not understand why you have done them because they do not change nothing
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once');
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once');
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
mask1 = ismissing(msg_AIS1) | ismissing(t1);
mask2 = ismissing(msg_AIS2) | ismissing(t2);
origidx1 = (1:length(msg_AIS1));
origidx2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; t1(mask1) = []; origidx1(mask1) = [];
msg_AIS2(mask2) = []; t2(mask2) = []; origidx2(mask2) = [];
DN1 = str2double(t1);
DN2 = str2double(t2);
dur1 = minutes(floor(DN1/100)) + seconds(mod(DN1,100));
dur2 = minutes(floor(DN2/100)) + seconds(mod(DN2,100));
[~, M1, S1] = hms(dur1); % Dar tiempo en 2 variables, utilizar M1 para crear rangos
[~, M2, S2] = hms(dur2);
num_msg_AIS1 = length(msg_AIS1);
msg_match = cell(num_msg_AIS1, 1);
for K = 1 : num_msg_AIS1
all_match_AIS = find(msg_AIS1(K) == msg_AIS2); % encontrar mesnaes iguales
if isempty(all_match_AIS);
fprintf('No hay coincidencias para la linia #%d -> "%s"\n', origidx1(K), msg_AIS1(K));
continue;
end
fprintf('potencial coincidencia #%d -> "%s", checking times\n', origidx1(K), msg_AIS1(K));
disp(K), disp(all_match_AIS)
complete_match_AIS = all_match_AIS(M1(K) == M2(all_match_AIS) | M1(K) == M2(all_match_AIS) - 1 |M1(K) == M2(all_match_AIS) + 1);% Rango creado +-1 minuto de cada mensaje
msg_match{K} = msg_AIS1(complete_match_AIS); %coger los mensajes de las lineas que han coincidido
% if isempty(complete_match_AIS)
% fprintf('line %#d -> "%s" coincide texto pero no tiempo\n', origidx1(K), msg_AIS1(K));
% else
% fprintf('line %#d -> "%s" matches on time too! Matches are:\n', origidx1(K), msg_AIS1(K));
% msg_AIS2(complete_match_AIS)
% end
end
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
%# quitar las celdas vacias
msg_match(emptyCells) = [];
then I removed the emptycells but there are some cells that contain a string of 2x1 or 3x1 that are messages. Why are those messages on a string? If they are repeated I want to have them in a different line. I am gonna do it now.
AND answering your question it would be the second option as you already done. I am really greatful
Walter Roberson
2021-9-18
S1s = [
"!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053"
"!AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B0053"
"!AIVDM,1,1,,B,15AMJH00000:1i8Ga0v@0Akb00Sw,0*390053"
"!AIVDM,1,1,,B,13EcsW7P00P:07JGc9Wh0?wb2<0m,0*0E0053"
"!AIVDM,1,1,,A,13Efqs800109q6fGb0tHhq?d2L1<,0*3C0054"
"!AIVDM,1,1,,A,137FrD02Bu0:=9@GS16Vu5O`00S<,0*5D0054"
"!AIVDM,1,1,,B,33F6AD00@0P9ud6GbtWAQmob22rA,0*2B0055"
"!AIVDM,1,1,,B,13F9RTPP00P9rL0GasQf4?wf2D1E,0*430055"
"!AIVDM,1,1,,A,4028j;1vDfG0o09cG0Gdh4i000S:,0*560056"
"!AIVDM,1,1,,A,D028j;0flffp,0*430056"
"!AIVDM,1,1,,B,13E`977P00P:06:Gc8DkW?wh2@Q3,0*690056"
"!AIVDM,1,1,,A,13EpM3PP0009nVdGb9Itfwwh0HQ6,0*0A0056"
"!AIVDM,1,1,,B,13GQ:Fw01@P9qH6GaS:WiVEh2<1E,0*6D0056"
"!AIVDM,1,1,,B,13cq;9000IP9saTGb3d0b0ad8<1s,0*320057"
"!AIVDM,1,1,,A,39NSCRU000P9uM`GbVRU=@qD0000,0*480057"
"!AIVDM,1,1,,A,13Esmv000009qWBGb=BLHAUl0L1U,0*5A0057"
"!AIVDM,1,1,,A,13F:b60P0J09sKpGb1UhGOwl0<1l,0*3C0058"
"!AIVDM,1,1,,A,13EaMT?000P9wK2Gblptiooh0D1;,0*430059"
"!AIVDM,1,1,,A,13GNje0P00P9nebGb5nv4?wl00SB,0*4B0059"
"!AIVDM,1,1,,A,13GQ>C@P00P9rHrGasGf4?wn20SK,0*020059"
"!AIVDM,1,1,,B,H3dPfW4UC=D7@?>q82knjo2P9430,0*610059"
"!AIVDM,1,1,,B,13GPhM0P00P9rGPGast>4?wn2<1C,0*080059"
"!AIVDM,1,1,,A,13ErMfPP01P9rG0Gasc>4?wn26p4,0*0F0100"
"!AIVDM,1,1,,A,39NSVP500009tArGbL07q6Mn0F;r,0*490100"
"!AIVDM,1,1,,A,4028ioivDfG0s09kDvGag6G006p0,0*010100"
"!AIVDM,1,1,,A,D028ioj"
"!AIVDM,1,1,,A,4028jJ1vDfG1009cHtGdh1g026p4,0*100100"
"!AIVDM,1,1,,B,137FrD0v2u0:=9TGRwMVu5N00H0j,0*5E0101"
"!AIVDM,1,1,,B,19NS@=@01qP9u@fGQs3PbP`40H0l,0*030101"
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbG1TJgv40H1;,0*7D0102"
];
S2s = [
"!AIVDM,1,1,,A,13ErMfPP00P9rFpGasc>4?wn2802,0*070000"
"!AIVDM,1,1,,B,13FMMd0P0009o1jGapD=5gwl06p0,0*780000"
"!AIVDM,1,1,,A,4028ioivDfFss09kDvGag6G0080D,0*790000"
"!AIVDM,1,1,,A,D028ioj<Tffp,0*2C0000"
"!AIVDM,1,1,,B,19NS@=@01qP9tp4GQkJ0bh`200SP,0*780000"
"!AIVDM,1,1,,B,137FrD0v2u0:=4pGS;s6u5On00SJ,0*000000"
"!AIVDM,1,1,,A,4028jJ1vDfG0009cIVGdh2?0280S,0*400000"
"!AIVDM,1,1,,B,H3GQ9khl4LLTF0l5T0000000000,2*070001"
"!AIVDM,1,1,,A,H33mw2Q>uV0luHTpN3800000000,2*080001"
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbFw4Jgv40L1f,0*030002"
"!AIVDM,1,1,,B,D028jJ03`N?b<`O6Dl<O6D0,2*350002"
"!AIVDM,1,1,,B,137JlD51h0P9tddGbCQSm0j2081e,0*0E0002"
"!AIVDM,1,1,,A,13EuB00P0009n`TGb82:ugv600RQ,0*110003"
"!AIVDM,1,1,,A,13EsReP00009vQ`Gbj65gPh400SJ,0*350003"
"!AIVDM,1,1,,B,33EtT>5000P9nI2Gb8H5FP<60Dm:,0*580004"
"!AIVDM,1,1,,B,13Efqs800109q6DGb0wHhq@826p0,0*0A0004"
"!AIVDM,1,1,,A,13F9RTPP00P9rKpGasQf4?v820Sf,0*6D0004"
"!AIVDM,1,1,,B,13EmCs70010:0;bGc<Lbh3280<14,0*340004"
"!AIVDM,1,1,,A,13EcsW7P00P:07PGc9Ws@gv82D0l,0*060004"
"!AIVDM,1,1,,B,13EpM3PP0009nVPGb9EJG?v:0<1N,0*590005"
"!AIVDM,1,1,,A,15AMJH00000:1i6Ga0oP0Af:0<16,0*470005"
"!AIVDM,1,1,,A,13cq;9000GP9sUlGb2IPRPb680T5,0*510005"
"!AIVDM,1,1,,B,4028j;1vDfG0509cFhGdh5Q0083a,0*5C0006"
"!AIVDM,1,1,,B,D028j;0flffp,0*400006"
"!AIVDM,1,1,,A,137FrD032u0:=5<GS;6Vu5N:0<1=,0*620006"
"!AIVDM,1,1,,B,13Esmv000009qW:Gb=BLHAT@0H4>,0*660007"
"!AIVDM,1,1,,A,13E`977P00P:06DGc8D00?v>26p0,0*2B0007"
"!AIVDM,1,1,,A,33cm<M1th209wjpGb066k1v<0P00,0*470007"
"!AIVDM,1,1,,A,13GQ:Fw01?P9qW4GaW=7d6:>26p0,0*150007"
"!AIVDM,1,1,,B,13F:b60P0I09sHjGb0D0Bwv@0<1i,0*780008"
"!AIVDM,1,1,,B,B3E`?Q00002Q8LUrpW7Q3wT5kP06,0*760008"
"!AIVDM,1,1,,B,H3`fKwPiDp40000000000000000,2*5F0008"
"!AIVDM,1,1,,B,H3`fKwTTDBE5847@4lpnl0200320,0*700008"
"!AIVDM,1,1,,B,33EaMT?000P9wK4Gblq<iop<00jk,0*390009"
"!AIVDM,1,1,,B,H4hJ<S0l58T4R118Tp<E=>0TTV0,2*450009"
"!AIVDM,1,1,,A,13GPhM0P00P9rGHGast>4?vB20Sr,0*610009"
"!AIVDM,2,1,2,B,53cq;982CRFhT8P<000`thiV1H4p4@Tt00000017D1@CC4en0K1DhPkP0000,0*2C0009"
"!AIVDM,2,2,2,B,00000000000,2*250009"
"!AIVDM,1,1,,B,13GQ>C@P00P9rHjGasGf4?vB2@5d,0*0D0009"
"!AIVDM,1,1,,A,7000003dTINH,0*6D0010"
"!AIVDM,1,1,,A,19NS@=@01qP9tt6GQlahb@`F0H65,0*2D0010"
"!AIVDM,1,1,,A,33FMMd0P0009o1fGapC<Uwv@000k,0*330010"
"!AIVDM,1,1,,B,4028ioivDfG0909kDvGag6G00@6;,0*730010"
"!AIVDM,1,1,,B,D028ioj<Tffp,0*2F0010"
"!AIVDM,2,1,3,A,53FMMd82;AMHD4i4001=049DpdE:3C400000001I081234He0=hUDTR@CPK0,0*030010"
"!AIVDM,2,2,3,A,hDm1C33kP00,2*1F0010"
"!AIVDM,1,1,,B,13ErMfPP00P9rFrGasc>4?vD20RI,0*3C0010"
"!AIVDM,1,1,,B,4028jJ1vDfG0:09cIRGdh1g0286J,0*090010"
"!AIVDM,1,1,,B,13GNje0P00P9nePGb5mN4?vD00S=,0*170011"
"!AIVDM,1,1,,B,13EoPo7P00P:0IjGc:d00?vF2@6T,0*490011"
"!AIVDM,1,1,,A,13FtuD?P00P9tuDGbFw4JgvH00T<,0*3F0011"
"!AIVDM,1,1,,B,H3GQ9klTC=D4s;H51npno0106400,0*040011"
"!AIVDM,1,1,,B,137FrD0uBu0:=68GS9P6u5NF0@7B,0*2D0012"
"!AIVDM,1,1,,A,H33mw2TTCBD6ubp00000001@4440,0*170012"
"!AIVDM,1,1,,B,H4hJ<S4T1=30000J7;FoPP1P<560,0*4A0013"
"!AIVDM,1,1,,A,B3EpfoP0002OoB5rlUwQ3wT5kP06,0*1B0013"
"!AIVDM,1,1,,B,15AMJH00000:1i6Ga0pP0AhL0D16,0*5B0014"
"!AIVDM,1,1,,B,13F9RTPP00P9rKrGasQf4?vL288J,0*570014"
];
AIS1 = S1s;
AIS2 = S2s;
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once');
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once');
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
mask1 = ismissing(msg_AIS1) | ismissing(t1);
mask2 = ismissing(msg_AIS2) | ismissing(t2);
origidx1 = (1:length(msg_AIS1));
origidx2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; t1(mask1) = []; origidx1(mask1) = [];
msg_AIS2(mask2) = []; t2(mask2) = []; origidx2(mask2) = [];
DN1 = str2double(t1);
DN2 = str2double(t2);
dur1 = minutes(floor(DN1/100)) + seconds(mod(DN1,100));
dur2 = minutes(floor(DN2/100)) + seconds(mod(DN2,100));
num_msg_AIS1 = length(msg_AIS1);
msg_match = cell(num_msg_AIS1, 1);
for K = 1 : num_msg_AIS1
time_mask = isbetween(dur2, dur1(K)-minutes(1), dur1(K)+minutes(1));
msg_match{K} = AIS2(time_mask);
end
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
%# quitar las celdas vacias
AIS1_with_matches = AIS1;
AIS1_with_matches(emptyCells) = [];
msg_match(emptyCells) = [];
%cross-checks to see that everything worked okay
AIS1_with_matches(1:3)
ans = 3×1 string array
"!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053"
"!AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B0053"
"!AIVDM,1,1,,B,15AMJH00000:1i8Ga0v@0Akb00Sw,0*390053"
msg_match(1:3)
ans = 3×1 cell array
{58×1 string}
{58×1 string}
{58×1 string}
msg_match{1}(1:3)
ans = 3×1 string array
"!AIVDM,1,1,,A,13ErMfPP00P9rFpGasc>4?wn2802,0*070000"
"!AIVDM,1,1,,B,13FMMd0P0009o1jGapD=5gwl06p0,0*780000"
"!AIVDM,1,1,,A,4028ioivDfFss09kDvGag6G0080D,0*790000"
AIS1_with_matches{end}
ans = '!AIVDM,1,1,,B,13FtuD?P00P9tuDGbG1TJgv40H1;,0*7D0102'
msg_match{end}(1:3)
ans = 3×1 string array
"!AIVDM,1,1,,B,13FtuD?P00P9tuDGbFw4Jgv40L1f,0*030002"
"!AIVDM,1,1,,B,D028jJ03`N?b<`O6Dl<O6D0,2*350002"
"!AIVDM,1,1,,B,137JlD51h0P9tddGbCQSm0j2081e,0*0E0002"
Yes, it worked. The entries with time 0102 are more than 1 minute from the entries with 0000 and 0001 so the 0000 and 0001 did not make it into the match list.
The outputs here are AIS1_with_matches and msg_match. AIS1_with_matches is the list of messages in AIS1 that match something inside AIS2. Then for each of those entries, msg_match is a cell array of all of the messages within +/- 1 minute in AIS2.
Notice that most messages are repeated a lot, since most messages are within 1 minute of most entries.
flashpode
2021-9-18
it did not work to me. The output I get is a cell in where every line contains a string. the last lines of your code do not run in my computer:
AIS1_with_matches(1:3)
msg_match(1:3)
msg_match{1}(1:3)
AIS1_with_matches{end}
msg_match{end}(1:3)
the code that you gave me before worked but I got strings inside the cell and I have to remove them. I am working on it but do not how.
[nRows, ~] = cellfun(@size,msg_match);
isMultiRow = nRows>1;
msg_match(isMultiRow) = cellfun(@(a) {a'}, msg_match(isMultiRow));
msg_match(isMultiRow) = cellfun(@(a){strsplit(a,'!AIV')},msg_match(isMultiRow));
here is the code I used but it gave me problems. Notice I want to split the strings in two or three different rows
Walter Roberson
2021-9-18
I will need extended data to test with. Please attach a .mat with more extensive data. It does not need to be your full data -- just enough to be able to reproduce the problems.
flashpode
2021-9-18
And this is the code:
linia_dolenta1=[];
linia_dolenta2=[];
N=size(AIS1,1)
P=size(AIS2,1)
for i=1:1:N
seq1=AIS1(i);
linia=convertStringsToChars(seq1);
if length(linia)<15
linia_dolenta1 = [linia_dolenta1,i];
end
end
for j=1:1:P
seq2=AIS2(j);
linia=convertStringsToChars(seq2);
if length(linia)<15
linia_dolenta2 = [linia_dolenta2,j];
end
end
size(AIS1)
size(AIS2)
AIS1([linia_dolenta1],:) = [];
AIS2([linia_dolenta2],:) = [];
size(AIS1)
size(AIS2)
N=size(AIS1,1)
P=size(AIS2,1)
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once');
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once');
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
mask1 = ismissing(msg_AIS1) | ismissing(t1);
mask2 = ismissing(msg_AIS2) | ismissing(t2);
origi_AIS1 = (1:length(msg_AIS1));
origi_AIS2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; t1(mask1) = []; origi_AIS1(mask1) = [];
msg_AIS2(mask2) = []; t2(mask2) = []; origi_AIS2(mask2) = [];
DN1 = str2double(t1);
DN2 = str2double(t2);
dur1 = minutes(floor(DN1/100)) + seconds(mod(DN1,100));
dur2 = minutes(floor(DN2/100)) + seconds(mod(DN2,100));
[~, M1, S1] = hms(dur1); % Dar tiempo en 2 variables, utilizar M1 para crear rangos
[~, M2, S2] = hms(dur2);
num_msg_AIS1 = length(msg_AIS1);
msg_match = cell(num_msg_AIS1, 1);
for K = 1 : num_msg_AIS1
all_match_AIS = find(msg_AIS1(K) == msg_AIS2); % encontrar mensajes iguales
if isempty(all_match_AIS);
fprintf('No hay coincidencias para la linia #%d -> "%s"', origi_AIS1(K), msg_AIS1(K)); % '%s' para un string
continue;
end
fprintf('potencial coincidencia #%d -> "%s", checking times', origi_AIS1(K), msg_AIS1(K));
disp(K), disp(all_match_AIS)
complete_match_AIS = all_match_AIS(M1(K) == M2(all_match_AIS) | M1(K) == M2(all_match_AIS) - 1 |M1(K) == M2(all_match_AIS) + 1);% Rango creado +-1 minuto de cada mensaje
msg_match{K} = msg_AIS1(complete_match_AIS); %coger los mensajes de las lineas que han coincidido
if isempty(complete_match_AIS)
fprintf('line %#d -> "%s" coincide texto pero no tiempo\n', origi_AIS1(K), msg_AIS1(K));
else
fprintf('line %#d -> "%s" Tambien coincide el tiempo Son:\n', origi_AIS1(K), msg_AIS1(K));
msg_AIS2(complete_match_AIS)
end
end
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
%# quitar las celdas vacias
msg_match(emptyCells) = [];
% find(strcmp(msg_match, string))
[nRows, ~] = cellfun(@size,msg_match);
isMultiRow = nRows>1;
msg_match(isMultiRow) = cellfun(@(a) {a'}, msg_match(isMultiRow));
msg_match(isMultiRow) = cellfun(@(a){strsplit(a,'!AIV')},msg_match(isMultiRow)); % here I got the problem
flashpode
2021-9-18
Matching_msg = cellstr(cat(1, msg_match{:})); using this function worked so the work is done. Thank you so much.
Walter Roberson
2021-9-18
That last line,
msg_match(isMultiRow) = cellfun(@(a){strsplit(a,'!AIV')},msg_match(isMultiRow)); % here I got the problem
What is the intention of that line?
Is the intention to remove the !AIV prefix from the 2nd and following entries for any one row?
flashpode
2021-9-18
No, the intention was to delete the strings inside the cell. But this problem is already solved Thank you.
Walter Roberson
2021-9-18
Revised code:
AIS1_file = '2021030100AIS1.txt';
AIS2_file = '2021030100AIS2.txt';
AIS1 = string(readlines(AIS1_file));
AIS2 = string(readlines(AIS2_file));
AIS1(strlength(AIS1) < 15) = [];
AIS2(strlength(AIS2) < 15) = [];
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once');
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once');
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
mask1 = ismissing(msg_AIS1) | ismissing(t1);
mask2 = ismissing(msg_AIS2) | ismissing(t2);
origidx1 = (1:length(msg_AIS1));
origidx2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; t1(mask1) = []; origidx1(mask1) = [];
msg_AIS2(mask2) = []; t2(mask2) = []; origidx2(mask2) = [];
DN1 = str2double(t1);
DN2 = str2double(t2);
dur1 = minutes(floor(DN1/100)) + seconds(mod(DN1,100));
dur2 = minutes(floor(DN2/100)) + seconds(mod(DN2,100));
num_msg_AIS1 = length(msg_AIS1);
msg_match = cell(num_msg_AIS1, 1);
for K = 1 : num_msg_AIS1
time_mask = isbetween(dur2, dur1(K)-minutes(1), dur1(K)+minutes(1));
msg_match{K} = reshape(AIS2(time_mask), 1, []); %user wants rows
end
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
%# quitar las celdas vacias
AIS1_with_matches = AIS1;
AIS1_with_matches(emptyCells) = [];
msg_match(emptyCells) = [];
nRows = cellfun(@length, msg_match);
isMultiRow = nRows>1;
%msg_match(isMultiRow) = cellfun(@(a){strsplit(a,'!AIV')},msg_match(isMultiRow)); % here I got the problem
flashpode
2021-9-23
How Could I add two variables that where the messages from AIS1 and AIS2 that did not match.
Make AIS1_NO_MATCHING = msg from AIS2 that are not in msg_match and AIS2_NO_MATCHING = msg from AIS2 that are not in msg_match without making the code run much slower.
Walter Roberson
2021-9-23
AIS1_file = '2021030100AIS1.txt';
AIS2_file = '2021030100AIS2.txt';
AIS1 = string(readlines(AIS1_file));
AIS2 = string(readlines(AIS2_file));
AIS1(strlength(AIS1) < 15) = [];
AIS2(strlength(AIS2) < 15) = [];
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once');
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once');
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
mask1 = ismissing(msg_AIS1) | ismissing(t1);
mask2 = ismissing(msg_AIS2) | ismissing(t2);
origidx1 = (1:length(msg_AIS1));
origidx2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; t1(mask1) = []; origidx1(mask1) = [];
msg_AIS2(mask2) = []; t2(mask2) = []; origidx2(mask2) = [];
DN1 = str2double(t1);
DN2 = str2double(t2);
dur1 = minutes(floor(DN1/100)) + seconds(mod(DN1,100));
dur2 = minutes(floor(DN2/100)) + seconds(mod(DN2,100));
num_msg_AIS1 = length(msg_AIS1);
msg_match = cell(num_msg_AIS1, 1);
matches_anything_in_AIS1 = false;
for K = 1 : num_msg_AIS1
time_mask = isbetween(dur2, dur1(K)-minutes(1), dur1(K)+minutes(1));
matches_anything_in_AIS1 = matches_anything_in_AIS1 | time_mask;
msg_match{K} = reshape(AIS2(time_mask), 1, []); %user wants columns
end
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
%# quitar las celdas vacias
AIS1_with_matches = AIS1;
AIS1_NO_MATCHING = AIS1(emptyCells);
AIS1_with_matches(emptyCells) = [];
msg_match(emptyCells) = [];
AIS2_NO_MATCHING = AIS2(~matches_anything_in_AIS1);
nRows = cellfun(@length, msg_match);
isMultiRow = nRows>1;
%msg_match(isMultiRow) = cellfun(@(a){strsplit(a,'!AIV')},msg_match(isMultiRow)); % here I got the problem
flashpode
2021-9-23
Well I meant with that code:
AIS1(strlength(AIS1) < 15) = [];
AIS2(strlength(AIS2) < 15) = [];
N=size(AIS1,1); %% Importante detras que sino daba error el codigo
msg_AIS1 = regexp(AIS1, '.*(?=\d{4}$)', 'match', 'once'); % todo el mensaje menos las ultimas 4 cifras
msg_AIS2 = regexp(AIS2, '.*(?=\d{4}$)', 'match', 'once');
t1 = regexp(AIS1, '\d{4}$', 'match', 'once'); % sacar ultimas 4 cifras
t2 = regexp(AIS2, '\d{4}$', 'match', 'once');
Time_AIS1 = duration(strcat('00:',extractBefore(t1,3),':',extractAfter(t1,2))); % Poner en formato hh:mm:ss
Time_AIS1 = Time_AIS1+hours(cumsum([0;diff(Time_AIS1)<0])); %añadir una unidad en hh cada vez que se reinicia mm:ss
Time_AIS2 = duration(strcat('00:',extractBefore(t2,3),':',extractAfter(t2,2)));
Time_AIS2 = Time_AIS2+hours(cumsum([0;diff(Time_AIS2)<0]));
mask1 = ismissing(msg_AIS1) | ismissing(Time_AIS1);
mask2 = ismissing(msg_AIS2) | ismissing(Time_AIS2);
origi_AIS1 = (1:length(msg_AIS1));
origi_AIS2 = (1:length(msg_AIS2));
msg_AIS1(mask1) = []; Time_AIS1(mask1) = []; origi_AIS1(mask1) = [];
msg_AIS2(mask2) = []; Time_AIS2(mask2) = []; origi_AIS2(mask2) = [];
[H1, M1, S1] = hms(Time_AIS1); % Dar tiempo en 2 variables, utilizar M1 para crear rangos
[H2, M2, S2] = hms(Time_AIS2);
msg_match = cell(N, 1);
for K = 1:1:N
all_match_AIS = find(msg_AIS1(K) == msg_AIS2); % encontrar mensajes iguales
if isempty(all_match_AIS) %fprintf para escribir datos en un archivo de texto
% fprintf('No hay coincidencias para la linia #%d -> "%s"\n', origi_AIS1(K), msg_AIS1(K)); % '%s' para un string
continue;
end
% fprintf('potencial coincidencia #%d -> "%s", checking times\n', origi_AIS1(K), msg_AIS1(K));
% disp(K), disp(all_match_AIS)
if H1(K)== H2(all_match_AIS)
% crear rango de coincidencia de minutos
complete_match_AIS = all_match_AIS(M1(K) == M2(all_match_AIS) | M1(K) == M2(all_match_AIS) - 1 | M1(K) == M2(all_match_AIS) + 1);% Rango creado +-1 minuto de cada mensaje
msg_match{K} = msg_AIS1(complete_match_AIS); %coger los mensajes de las lineas que han coincidido. IMPORTANTE
end
if isempty(complete_match_AIS)
% fprintf('line %#d -> "%s" coincide texto pero no tiempo\n', origi_AIS1(K), msg_AIS1(K));
else
% fprintf ('line %#d -> "%s" coincide tambien el tiempo. Los resultados son:\n', origi_AIS1(K), msg_AIS1(K));
msg_AIS2(complete_match_AIS) %IMPORTANTE
end
%# encontrar celdas vacias (creacion de la variable)
emptyCells = cellfun(@isempty,msg_match);
%# quitar las celdas vacias
msg_match(emptyCells) = [];
% Quitar los strings de dentro de la cell (cat)--> para concadenar
Matching_msg = cellstr(cat(1, msg_match{:}));
end
Matching_msg = string(Matching_msg);
Walter Roberson
2021-9-23
Why are you still using that version ? I gave you revised efficient tested code 5 days ago.
flashpode
2021-9-23
Because it does not work to me, I mean it does not do the comparison I do not know why
更多回答(1 个)
chrisw23
2021-9-22
strEx = "!AIVDM,1,1,,A,137JlD52h0P9tdRGbCQSm0kV0<1p,0*4C0053 !AIVDM,1,1,,B,13EsReP00009vQ`Gbj65gPiV00Sd,0*7B0053";
% check/modify the expression under https://regex101.com/
exp = "(?<prefix>!\w*),(?<ident1>\d),(?<ident2>\d),,(?<ident3>\w),(?<strLoad>[\w\d:?<>@`]*),(?<time>[*\d\w]*)";
tbl = struct2table(regexp(strEx,exp,'names'))
This is just an example how to parse text by a simple grouped regular expression. I use the website described to write and test expressions. The table allows easy access for further processing (ie. datetime conversion) as previously shown. Look at string based compare methods like 'contains' or 'matches' , i.e. tbl.strLoad.contains("137JlD52h0P9td") -> results in logical index to access matches
Hope it helps
Christian
2 个评论
Walter Roberson
2021-9-23
[\w\d:?<>@`]
I think that could more easily be [^,] which is "anything other than a comma"
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Timetables 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)