Fastest Way to write data to a text file - fprintf

70 次查看(过去 30 天)
I am writing a lot of date to a text file one line at a time (1.7 million rows, 4 columns) that is comprised of different data types. I'm wondering if there is a better way to do this than 1 line at a time that might yield much faster results.
Here is what I'm doing now.
ExpSymbols = Char Array
ExpDates = Numeric Array
MyFactor = Numeric Array
FctrName = Char Array
ftemp = fopen('FileName','w' );
for i = 1:length(MyFactor)
fprintf(ftemp, '%s,%i,%f,%s\r\n',ExpSymbols(i,:), ExpDates(i,1), MyFactor(i,1),[FctrName '_ML']);
end
fclose(ftemp);
Thanks in advance,
Brian

采纳的回答

Jan
Jan 2013-8-2
You can try to suppress the flushing by opening the file in the 'W' instead of the 'w':
ftemp = fopen('FileName', 'W'); % uppercase W
Fmt = ['%s,%i,%f,', FctrName '_ML\r\n'];
for i = 1:length(MyFactor)
fprintf(ftemp, Fmt, ExpSymbols(i,:), ExpDates(i), MyFactor(i));
end
fclose(ftemp);
  9 个评论
Brian
Brian 2013-8-5
编辑:Brian 2013-8-5
You're right, saving the variables by themselves is much quicker than writing to a flat file. I changed my code to write to C:\Temp (as you suggested above) and the save took .97 seconds and the load took .33 seconds. The formatted flat file is 62 MB in size and the .mat file is only 15MB or so. I do need a properly formatted file for the other system to read as it can't read .mat files.
All fields need to be in one file but it sounds like you're saying that the writing of mixed data types is what's making the write unnecessarily slow. Can I write one data type at a time to the same file using a loop structure for each data type?
dpb
dpb 2013-8-5
A) Can you offload the formatting from this code to a second one that processes the .mat files and writes the formatted ones? Won't save any overall but moves it to a different place where the bottleneck might not be so evident? For example, you could have a second background process doing that conversion while the primary analyses are done interactively? All depends on the actual workflow as to whether helps or not, of course.
B) Can your target app read the data variables sequentially one after the other instead of all a record at a time as you're currently writing them? If so, sure you can write each w/o any loop at all and it will likely be faster by at least a measurable amount as Jan suggests.
C) You might just see what the text option of save does in comparison for speed--don't know it'll help but what they hey...

请先登录,再进行评论。

更多回答(1 个)

dpb
dpb 2013-8-2
编辑:dpb 2013-8-3
It's a pita for mixed fields--I don't know of any clean way to mix them in fprintf c
I generally build the string array internally then write the whole thing...
cma=repmat(',',length(dates),1); % the delimiter column
out=[symb cma num2str(dates) cma factor cma names];
fprintf(fid, '%s\n', out);
fid=fclose(fid);
names is a placeholder for the FactorName that I guess may be a constant? If so, it can be inserted into the format string as Jan assumed; if not needs to be built as the column of commas to concatenate however it should be.
  6 个评论
Brian
Brian 2013-8-5
Just to convert my two numeric arrays to string takes 55 seconds. This is slower than writing the file with the mixed data types using fprintf and the 'W' argument. I'm still not sure what you are referring to when you talk about "stream." I'm not familiar with that.
dpb
dpb 2013-8-5
Also called "binary". It's unformatted i/o which has the benefits for speed of
a) full precision for float values at minimum number of bytes/entry, b) eliminates the format conversion overhead on both input and output
doc fwrite % and friends
or if could stay in Matlab then
doc save % and load is only slightly higher-level
The possible disadvantage is, of course, you can't just look at a file and read it; but who's going to manually be looking at such large files, anyway?

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Environment and Settings 的更多信息

标签

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by