Matlab coder str2num alternatives?

3 次查看(过去 30 天)
I have this data stored in a character array. I've used fread and removed headers to get this data from a text file (I'm constrained not to use textscan or fileread as they are not supported by Matlab Coder, also find it difficult to use coder.ceval to use fscanf).
unsorted_data 1x767 char
1
-8.3033E-01 -4.2882E+00 -8.4900E+00 -4.0889E-01 -4.2372E+00 -1.3796E+00
-1.1903E+00 -3.9289E+00 -6.2813E+00 -9.2360E-01 -2.8582E+00 -1.2460E+00
2
-3.6261E+00 -4.7218E+00 1.4143E+01 1.6041E+00 -5.1505E+00 1.6737E+00
-3.9131E+00 -5.9048E+00 -2.7256E+01 2.0434E+00 -1.6630E+01 5.5229E+00
3
2.2578E+01 -1.7633E-02 2.1166E+01 2.8041E-01 1.8919E+00 2.4702E+01
6.0947E+01 5.1242E+00 4.0910E+01 -1.0404E+01 -4.8758E+00 5.0202E+01
Need to extract every third row (R1, R4, R7, R10,...) as double [Nx1] and a second matrix having the other rows of data [Nx6].
So far I'm able to extract the first part (R1, R4, R7, R10,...) in "numbers" variable, but I get NaNs for "Vector" variable. This would work with str2num but is not supported by Matlab Coder.
remain = unsorted_data;
data_str = string([]);
while (remain ~= string())
[token,remain] = strtok(remain, char(10));
data_str = [data_str ; token];
end
data = str2double(data_str);
len_data = length(data);
cnum = 1;
cvector = 1;
vector_rows = 2;
number = zeros(len_data/(vector_rows+1),1);
Vector = zeros(len_data*vector_rows/(vector_rows+1),1);
for i = 1:len_data
num_loc = (vector_rows+1)*(cnum-1)+1;
if i == num_loc
number(cnum,1) = data(i,1);
cnum = cnum+1;
else
Vector(cvector,1) = data(i,1);
cvector = cvector+1;
end
end
I'm looking to get two matrices of this data in the right format and secondly make this more efficient by replacing the "while" loop, as it takes too much time to process 5mil lines. Any help is greatly appreciated.
  4 个评论
dpb
dpb 2018-8-30
What about fgetl and parse a line at a time? Is it supported?

请先登录,再进行评论。

采纳的回答

Stephen23
Stephen23 2018-8-31
编辑:Stephen23 2018-8-31
As far as I can tell from that list of coder-supported functions, something like this should work. The basic idea is to split the char vector into two preallocated cell arrays, then convert to numeric. Given your 1x767 char vector:
  • identify whitespace using isstrprop.
  • use diff and find to get indices of the numbers.
  • use eq and find to locate newline characters.
  • preallocate two cell arrays (perhaps transposed).
  • use for loop over the indices and collect the char numbers into the cell arrays.
  • apply str2double to both cell arrays.
  2 个评论
Arun
Arun 2018-8-31
编辑:Arun 2018-8-31
Can you show step 3 to find new line characters?
Also would this be process be faster than the while loop in the above code?
dpb
dpb 2018-8-31
编辑:dpb 2018-8-31
ix=find(unsorted_data==char(10));
unsorted_data is just an array of characters; internally they're just byes so they can be operated on as if were just numbers (which they are, internally, it's only for user interface they have a different interpretation).
That will be very fast; in the loop
while (remain ~= string())
[token,remain] = strtok(remain, char(10));
data_str = [data_str ; token];
end
you're using dynamic reallocation by concatenating each new token onto the previous data to build the array; that is about the most inefficient operation there is in Matlab as it forces reallocation and copy every pass. If the size gets large, the bottleneck really begins to show up.
ADDENDUM
Try this for starters...I loaded the file as char() w/o the headers into txt, turn it into row rather than column vector for the following...
ix=find(txt==char(10));
i1=1;
for i=1:10
i2=ix(i)-2;
s=txt(i1:i2);
disp(s),
i1=i2+3;
end
will find/break out the first 10 lines/records.
It does make it somewhat more of a pain when coder doesn't support any of the formatted read functions -- the problem w/ just applying str2double on the returned string above excepting for the 1:3:N single values is that str2double isn't vectorized; it returns NaN because the whole string isn't a single value for the rest of the records.
What you needs must do is iterate over ix except by 1:3:length(ix) and inside the loop increment to get the next two records but split them based on their fixed-column positions to pass to str2double.
Or, one could compute the start/stop locations of the records to remove the serial number records, then reshape() by the field width of each floating point field and end up with a long column to process/convert then reshape() the result in the end.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by