Question about optimizing reading data from text file

Question

0 个投票

Hello, thanks for reading this,

I currently have a reader that reads in mesh files, and it works, but depending on the size of the file it can take a very long time. I was hoping I can optimize it for speed.

What I do first is read in a text file and change every line into a matrix of characters using the lines:

   cac = textscan( fid, '%[^\n]' );
   fclose(fid);
   A  = char( cac{1} );

where A is my character matrix. I then search through the text file for identifiers for data I need. How I accomplish this is by setting start of data indices and end of data indices. I basically read this line by line, and at the moment, I assume it will always be formatted in a certain way.

After I have these indices, I use sscanf functions to read the characters as %f or %x numbers and store them into matrices. This is the part where the profiler says it takes the longest to complete.

I posted the MATLAB reader function here: http://pastebin.com/FFtgXzg4, since it is a bit long to post here. My specific questions are: do I have to convert the whole text import into a character matrix, and is there any way I can do this without needing a for loop? The loops using sscanf take a very long time.

It works, but just barely so. I can send a test import file if needed.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Cedric 2013-5-24

Could you post e.g. 20 lines of your data file, and define these identifiers that are are referring to?

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Jonathan Sullivan 2013-5-23

编辑：Jonathan Sullivan 2013-5-23

在 MATLAB Online 中打开

0 个投票

You may want to use fread and regexp.

Without seeing your file, I can't say for sure this will produce the same result, but it should give you a good starting point.

% Using regexp and fread
fid = fopen(filename,'r');
tic;
A = regexp(fread(fid,'*char')','\n','split');
A = char( A{:} );
toc
fclose(fid);
% Using textscan
fid = fopen(filename,'r');
tic;
B = textscan(fid,'%[^\n]');
B2 = char(B{1});
toc
fclose(fid);

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Brian 2013-5-23

It seems that the text scan I have goes slightly faster than the regexp/fread combination. There is one last part of the code that seems to be giving me problems:

When I have my start and end indices, I use sscanf line by line to give me the real data I need. However, some of my character matrices can be very large: sometimes spanning hundreds of thousands of rows (depending on the number of tetrahedra I have).

Is it possible to read this in any kind of intelligent fashion using sscanf line by line, or use it as a vector component, or should I look into exporting the matrix to a formatted text file and re-importing it using textread and hex2dec?

In these areas, I will always have the following combination of characters:

xxx xxx xxxx x x,

where I believe it can be split by a space delimiter. That leaves me with five hexadecimal values per row.

请先登录，再进行评论。

Question about optimizing reading data from text file

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

类别

标签

Community Treasure Hunt

Question about optimizing reading data from text file

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

类别

标签

另请参阅

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论