Parsing Strings with Values Missing
2 次查看(过去 30 天)
显示 更早的评论
Hi everyone!
I am currently working on a code that will allow me to extract the elevation of multiple GPS's from a string of data. However, each line of data will only contain information about 4 (or less) GPS's before continuing on a new line. This means the last line often doesn't have the same amount of data as the first lines. I tried working around this by creating an if-else statement. Sadly, this doesn't work as Matlab when parsing the data does not recognize two consecutive commas as a value missing and doesn't count it. This means I will get the wrong values into my matrix. I don't know how to overcome this? I have copied a couple lines of my data below as well as my code. The code is over 800 lines in total so it's just a small excerpt of the entire code.
A quick explanation of the data - I am looking to extract the 2 digit number just before the 3 digit number. That's the elevation of the GPS's in the sky in degrees. I need both GPGSV and GLGSV. The first number is the amount of lines for the particular GPS reading. The second number is the actual line number - so the first line is line 1 of 3 and so on. The 3rd number is the number of satellites. The 4th number is irrelevant in my data collection.
Thank you very much in advance!
----------------------------------DATA------------------------------------
$GPGSV,3,1,12,01,09,252,27,03,46,296,47,04,02,227,20,14,27,103,46*7C
$GPGSV,3,2,12,16,25,184,26,22,02,159,32,23,19,300,48,25,19,041,40*74
$GPGSV,3,3,12,26,52,161,50,29,09,079,43,31,65,038,50,48,23,236,36*71
$GLGSV,3,1,09,67,08,149,,67,24,150,30,68,80,173,43,78,72,003,40*62
$GLGSV,3,2,09,70,10,333,,86,03,009,28,77,20,039,34,69,42,324,38*6E
$GLGSV,3,3,09,87,02,059,,,,,,,,,,,,,*5D
----------------------------------DATA------------------------------------
----------------------------------CODE------------------------------------
%GSV data
GSVcheck = strfind(AllData{1}, 'GSV');
GSVrows = find(~cellfun('isempty',GSVcheck));
GSVdata = AllData{1}(GSVrows);
GSVlength = floor(length(GSVdata)/6);
%'Empty' matrices
GSV = cell(DistanceLength*6,1);
%Parse $GSV
parseGSVdata = strsplit(GSVdata{counter},',');
numLines = parseGSVdata{2};
lineNum = parseGSVdata{3};
if lineNum ~= numLines
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
GSV{counter,3} = parseGSVdata{14};
GSV{counter,4} = parseGSVdata{18};
elseif lineNum == numLines
dataLeft = parseGSVdata{4};
dataAmount = numLines*4 - dataLeft;
if dataAmount == 1
GSV{counter,1} = parseGSVdata{6};
elseif dataAmount == 2
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
elseif dataAmount == 3
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
GSV{counter,3} = parseGSVdata{14};
elseif dataAmount == 4
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
GSV{counter,3} = parseGSVdata{14};
GSV{counter,4} = parseGSVdata{18};
end
end
----------------------------------CODE------------------------------------
0 个评论
采纳的回答
dpb
2016-6-3
编辑:dpb
2016-6-4
Actually, since the values are regularly spaced, simply create a format string for the ones you want...I picked the first and the last record...and put into a string gpg and glg, respectively...
>> fmt=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];
>> gpval=cell2mat(textscan(gpg,fmt,'delimiter',','))
gpval =
9 46 2 27
>> glval=cell2mat(textscan(glg,fmt,'delimiter',','))
glval =
2 NaN NaN NaN
>>
ADDENDUM
The missing value conundrum is associated with using a '%d' numeric format instead of '%f'; the default value of NaN can't be stored in an integer which is the default class returned. I was unaware of that until some further checking on what was happening...had always presumed everything numeric would be double(*) by default unless specifically cast to something else.
() Although it is, indeed, documented that *textscan returns the output class of int or uint, for us old-timers used to "everything in Matlab is double unless", it takes some getting used to these new-fangled ways. [f|s]scanf, for instance, do not do this but return double...and the old standby around "since forever" precursor to textscan, textread doesn't, either.
>> type int.dat
23,133
>> textread('int.dat','%d','delimiter',',')
ans =
23
133
>> whos ans
Name Size Bytes Class Attributes
ans 2x1 16 double
>>
4 个评论
dpb
2016-6-6
Well, the repmat is solely Matlab; there's always the recourse of writing N (in this case, 20) individual format strings but I find it easier to keep track of "who's who in the zoo" if use the symmetry that is in the input record (presuming there is some, of course, which there usually is). C writes the format string as [Width[.Precision]]DataType instead of Fortran FORMAT DataType[Width[.Precision]]. Since there's a numeric value in front of the Type specifier, it makes parsing a form for a repeat multiplier very tough so it isn't implemented; hence you have to write every element explicitly in one form or another. What an unnecessary pain it is, indeed... :(
Anyway, that annoyance aside, glad you got it going; hope something was learned as well as solving the immediate problem.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 String Parsing 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!