Question about optimizing reading data from text file

2 次查看(过去 30 天)
Hello, thanks for reading this,
I currently have a reader that reads in mesh files, and it works, but depending on the size of the file it can take a very long time. I was hoping I can optimize it for speed.
What I do first is read in a text file and change every line into a matrix of characters using the lines:
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A = char( cac{1} );
where A is my character matrix. I then search through the text file for identifiers for data I need. How I accomplish this is by setting start of data indices and end of data indices. I basically read this line by line, and at the moment, I assume it will always be formatted in a certain way.
After I have these indices, I use sscanf functions to read the characters as %f or %x numbers and store them into matrices. This is the part where the profiler says it takes the longest to complete.
I posted the MATLAB reader function here: http://pastebin.com/FFtgXzg4, since it is a bit long to post here. My specific questions are: do I have to convert the whole text import into a character matrix, and is there any way I can do this without needing a for loop? The loops using sscanf take a very long time.
It works, but just barely so. I can send a test import file if needed.
  1 个评论
Cedric
Cedric 2013-5-24
Could you post e.g. 20 lines of your data file, and define these identifiers that are are referring to?

请先登录,再进行评论。

回答(1 个)

Jonathan Sullivan
Jonathan Sullivan 2013-5-23
编辑:Jonathan Sullivan 2013-5-23
You may want to use fread and regexp.
Without seeing your file, I can't say for sure this will produce the same result, but it should give you a good starting point.
% Using regexp and fread
fid = fopen(filename,'r');
tic;
A = regexp(fread(fid,'*char')','\n','split');
A = char( A{:} );
toc
fclose(fid);
% Using textscan
fid = fopen(filename,'r');
tic;
B = textscan(fid,'%[^\n]');
B2 = char(B{1});
toc
fclose(fid);
  1 个评论
Brian
Brian 2013-5-23
It seems that the text scan I have goes slightly faster than the regexp/fread combination. There is one last part of the code that seems to be giving me problems:
When I have my start and end indices, I use sscanf line by line to give me the real data I need. However, some of my character matrices can be very large: sometimes spanning hundreds of thousands of rows (depending on the number of tetrahedra I have).
Is it possible to read this in any kind of intelligent fashion using sscanf line by line, or use it as a vector component, or should I look into exporting the matrix to a formatted text file and re-importing it using textread and hex2dec?
In these areas, I will always have the following combination of characters:
xxx xxx xxxx x x,
where I believe it can be split by a space delimiter. That leaves me with five hexadecimal values per row.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Large Files and Big Data 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by