applying time range to multiple txt files very slow

1 次查看(过去 30 天)
Hi there,
I have a large set of ".txt" data files. I then apply timerange to extract data between specific dates and times. My script looks something like this:
warning off
ds_loc = 'Z:\data\*.txt';
ds = datastore(ds_loc);
ds.ReadSize = 1000000;
ds.Delimiter = ' ';
ds.MultipleDelimitersAsOne = 1;
ds.SelectedFormats(1) = {'%{dd/MM/yyyy HH:mm:ss}D'};
warning on
% create time table
tt = tall(ds);
ttab = table2timetable(tt)
strt_time = '03/24/2018 10:00:00'
end_time = '03/25/2018 00:00:00'
warning off
S1 = timerange(strt_time,end_time);
warning on
ttab(S1,:)
The above script takes a long time to execute depending on the number of files in the datastore location i.e. "Z:\data". Is there a better way do this?
  7 个评论
minomi
minomi 2018-8-22
I'm sorry I don't quite understand what you've written. What are you suggesting is the way to do this?
dpb
dpb 2018-8-22
编辑:dpb 2018-8-22
I was just commenting on the problem with sequence of defining the date format...it seems as though datastore reads data (how much I've no idea) to infer format on creation of the object but you can't tell it a priori what the date format is but have to do that with a property internal to the object. That means, it would seem, that if it gets it wrong it has to recompute or reread all that information that's a waste of time; if it did get it right at least that part is ok but historically when a format wasn't given processing was significantly longer than when one was; I don't know if that effect is true here or not.
As far as speeding up the retrieval, I don't have any real suggestions as I've not had opportunity to try to use any of the large data tools "in anger" so don't know their idiosyncracies at all.
Just how big are the files and how many are there? Might it possibly turn out to be faster to simply loop through them explicitly rather than using the overhead of the magic behind the scenes datastore object?
Are they all the same form or does the index vector have to be updated for each file? It appears that timerange makes the assumption of a fixed index across the population.

请先登录,再进行评论。

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Data Preprocessing 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by