Parsing Strings with Values Missing

Question

Thore 2016-6-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/287127-parsing-strings-with-values-missing

评论： dpb 2016-6-6

Hi everyone!

I am currently working on a code that will allow me to extract the elevation of multiple GPS's from a string of data. However, each line of data will only contain information about 4 (or less) GPS's before continuing on a new line. This means the last line often doesn't have the same amount of data as the first lines. I tried working around this by creating an if-else statement. Sadly, this doesn't work as Matlab when parsing the data does not recognize two consecutive commas as a value missing and doesn't count it. This means I will get the wrong values into my matrix. I don't know how to overcome this? I have copied a couple lines of my data below as well as my code. The code is over 800 lines in total so it's just a small excerpt of the entire code.

A quick explanation of the data - I am looking to extract the 2 digit number just before the 3 digit number. That's the elevation of the GPS's in the sky in degrees. I need both GPGSV and GLGSV. The first number is the amount of lines for the particular GPS reading. The second number is the actual line number - so the first line is line 1 of 3 and so on. The 3rd number is the number of satellites. The 4th number is irrelevant in my data collection.

Thank you very much in advance!

----------------------------------DATA------------------------------------

$GPGSV,3,1,12,01,09,252,27,03,46,296,47,04,02,227,20,14,27,103,46*7C

$GPGSV,3,2,12,16,25,184,26,22,02,159,32,23,19,300,48,25,19,041,40*74

$GPGSV,3,3,12,26,52,161,50,29,09,079,43,31,65,038,50,48,23,236,36*71

$GLGSV,3,1,09,67,08,149,,67,24,150,30,68,80,173,43,78,72,003,40*62

$GLGSV,3,2,09,70,10,333,,86,03,009,28,77,20,039,34,69,42,324,38*6E

$GLGSV,3,3,09,87,02,059,,,,,,,,,,,,,*5D

----------------------------------DATA------------------------------------

----------------------------------CODE------------------------------------

%GSV data

GSVcheck = strfind(AllData{1}, 'GSV');

GSVrows = find(~cellfun('isempty',GSVcheck));

GSVdata = AllData{1}(GSVrows);

GSVlength = floor(length(GSVdata)/6);

%'Empty' matrices

GSV = cell(DistanceLength*6,1);

%Parse $GSV

parseGSVdata = strsplit(GSVdata{counter},',');

numLines = parseGSVdata{2};

lineNum = parseGSVdata{3};

if lineNum ~= numLines

    GSV{counter,1} = parseGSVdata{6};
    GSV{counter,2} = parseGSVdata{10};
    GSV{counter,3} = parseGSVdata{14};
    GSV{counter,4} = parseGSVdata{18};

elseif lineNum == numLines

    dataLeft = parseGSVdata{4};
    dataAmount = numLines*4 - dataLeft;
    if dataAmount == 1
        GSV{counter,1} = parseGSVdata{6};
    elseif dataAmount == 2
        GSV{counter,1} = parseGSVdata{6};
        GSV{counter,2} = parseGSVdata{10};
    elseif dataAmount == 3
        GSV{counter,1} = parseGSVdata{6};
        GSV{counter,2} = parseGSVdata{10};
        GSV{counter,3} = parseGSVdata{14};
    elseif dataAmount == 4
        GSV{counter,1} = parseGSVdata{6};
        GSV{counter,2} = parseGSVdata{10};
        GSV{counter,3} = parseGSVdata{14};
        GSV{counter,4} = parseGSVdata{18};
    end

end

----------------------------------CODE------------------------------------

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

dpb 2016-6-3

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/287127-parsing-strings-with-values-missing#answer_224353

编辑：dpb 2016-6-4

在 MATLAB Online 中打开

Actually, since the values are regularly spaced, simply create a format string for the ones you want...I picked the first and the last record...and put into a string gpg and glg, respectively...

>> fmt=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];
>> gpval=cell2mat(textscan(gpg,fmt,'delimiter',','))
gpval =
   9    46     2    27
>> glval=cell2mat(textscan(glg,fmt,'delimiter',','))
glval =
   2   NaN   NaN   NaN
>>

ADDENDUM

The missing value conundrum is associated with using a '%d' numeric format instead of '%f'; the default value of NaN can't be stored in an integer which is the default class returned. I was unaware of that until some further checking on what was happening...had always presumed everything numeric would be double(*) by default unless specifically cast to something else.

() Although it is, indeed, documented that *textscan returns the output class of int or uint, for us old-timers used to "everything in Matlab is double unless", it takes some getting used to these new-fangled ways. [f|s]scanf, for instance, do not do this but return double...and the old standby around "since forever" precursor to textscan, textread doesn't, either.

>> type int.dat
23,133
>> textread('int.dat','%d','delimiter',',')
ans =
  23
 133
>> whos ans
Name      Size            Bytes  Class     Attributes
ans       2x1                16  double

>>

4 个评论
显示 2更早的评论隐藏 2更早的评论

Thore 2016-6-3

编辑：Thore 2016-6-3

Thank you so much for your answer! I was able to modify it slightly to work in the code. I now have a matrix with all the elevation values which is great. However, after searching online for a good while I must admit I have no idea how to modify the repmat in order to find a different value in the same set of data. The 2 digit number right after the 3 digit number is the noise on the signal. I have tried messing around with all the numbers to see what would happen but in the end I always end up with the numbers for the elevation. I have attached the new code below and how I wish to use it to find the 4th value in the repeated part of the data as well as the 2nd value which is the elevation. Thanks in advance.

--------------------------CODE---------------------------

%Parse $GSV - Elevation

fmt=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];

GSV_L = cell2mat(textscan(GSVdata{counter},fmt,'delimiter',',','collectoutput',true));

GSV = vertcat(GSV,GSV_L);

assignin('base','Elevation',GSV);

%Parse $GSV - Noise

fmt2=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];

Noise_L = cell2mat(textscan(GSVdata{counter},fmt2,'delimiter',',','collectoutput',true));

Noise = vertcat(GSV,Noise_L);

assignin('base','Noise',Noise);

--------------------------CODE---------------------------

dpb 2016-6-3

编辑：dpb 2016-6-5

在 MATLAB Online 中打开

"... I have no idea how to modify the repmat in order to find a different value in the same set of data."

It's not repmat that needs changing, it's the choice of which fields are to be returned and which skipped. All repmat is doing is repeating a given pattern a fixed number of times instead of writing each individual format one at a time manually.

Break it down from the left and inside out...it

Skips a string: '%*s'
Skips four numbers: repmat('%*f',1,4)
Reads a number then skips three: ['%f' repmat('%*f',1,3)]
And repeats that pattern 4 times: repmat([...],1,4)
Then finally skips the last string: '%*s'

If you want a different group or multiple values, work through the position of each and build the matching string.

Alternatively, simply read the whole numeric array and then just delete from memory the columns not of interest.

ADDENDUM/RANT :) :

And, all this gyration because the creators of C (which Matlab formatted i/o uses C-library-derived functions fprintf and friends) couldn't stand the thought of "not invented here" and wrote a formatting string definition that can't accept repeat specifiers as were already well established in FORTRAN (now Fortran) at the time (along with introducing the problems with fixed width input parsing). :( With a sensible rearranging of the order of the field, the above logically could have been written far simpler as

FMT=['*A *4F 4(F *3F) *A'];

using the 'A' for character data from Fortran FORMAT instead of 's' and presuming keeping the '*' for skipping an input field which Fortran doesn't have.

END RANT (not directed really at TMW, just a "pet peeve")

Thore 2016-6-6

Again, thank you so much for your help! I highly appreciate it. My code works perfectly now and I can continue analyzing my data. It does seem strange that it had to be done in such a cumbersome way if there was already an easy way to do it in Fortran but you have helped me past my hurdle.

dpb 2016-6-6

Well, the repmat is solely Matlab; there's always the recourse of writing N (in this case, 20) individual format strings but I find it easier to keep track of "who's who in the zoo" if use the symmetry that is in the input record (presuming there is some, of course, which there usually is). C writes the format string as [Width[.Precision]]DataType instead of Fortran FORMAT DataType[Width[.Precision]]. Since there's a numeric value in front of the Type specifier, it makes parsing a form for a repeat multiplier very tough so it isn't implemented; hence you have to write every element explicitly in one form or another. What an unnecessary pain it is, indeed... :(

Anyway, that annoyance aside, glad you got it going; hope something was learned as well as solving the immediate problem.

请先登录，再进行评论。

Parsing Strings with Values Missing

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论
显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Parsing Strings with Values Missing

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论 显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论