Appending to a saved dataset
9 次查看(过去 30 天)
显示 更早的评论
I'm trying to read data from a text file, do some data analysis, save the results in a dataset, and export my dataset into a .dat file using the export function.
The problem arises when I have several text files and I wind up with well over 100,000 observations and about 200 parameters. My approach right now is, I read data from the text file, save my data analysis in an interim dataset, concatenate my complete dataset with the interim, and at the end of it all I use the export function. So my code looks something like:
complete_ds = [];
for i = 1:length(textfiles),
current_file = textfiles(i);
fid = fopen(current_file);
data = ReadFile(fid);
fclose(fid);
interim_ds = AnalyzeData(data);
complete_ds = vertcat(complete_ds, interim_ds);
end
export(complete_ds, 'file', 'Allmydata.dat');
This is taking a lot of time and I'd like to be able to append to the exported dataset instead. Any suggestions? Also, I know that preallocating may help, but it is difficult to predict how much memory I want to set aside for the dataset since each text file may have a different number of observations.
3 个评论
Image Analyst
2011-6-21
How many text files? How much time? Minutes? Hours? What is the difference between observations and parameters (if that matters)? You can take a guess at preallocating by looking at the file size. If you have 50,000 lines (estimated from a file size of, say, 50 kb), then preallocating say 40 or 50 thousand rows in the array would be faster than allocating none at all, even if you have to extend it a few rows or truncate it a few rows because you didn't use them all. Inside AnalyzeData(), can you possibly estimate the number of rows that interim_ds will need?
回答(1 个)
Matt Tearle
2011-6-21
If it just comes down to "I'd like to be able to append to the exported dataset instead", then here's one way to do it, but it's a bit of a nasty hack...
- Find the directory $MATLAB\toolbox\shared\statslib\@dataset (where $MATLAB is your installation directory -- eg C:\Program Files\MATLAB\R2011a).
- Copy the entire @dataset directory to somewhere local.
- Inside @dataset, make a copy of export.m and call it export_app.m (or whatever).
- Edit export_app.m. On line 1, change export to export_app. Change line 169 (in R2011a, at least -- it might be slightly different in other releases) from fid = fopen(filename,'wt'); to fid = fopen(filename,'at'); Save the file.
Then
>> export(x1,'file','testappend.dat')
>> export_app(x2,'file','testappend.dat','WriteVarNames',false)
should work for you.
Note, though, that you're now using a local version of the dataset class, so funky instabilities may ensue... Use with caution! Probably best to hide it away in a directory somewhere and go into that directory only for this purpose!
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Low-Level File I/O 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!