Increase speed to read text file and parse date time data.

2 次查看(过去 30 天)
I'm searching a faster way to read this file and convert the date and time to serial date number.
Datum Tid Värde
2015-10-12 00:02:16 23.399999619
2015-10-12 00:07:16 23.399999619
2015-10-12 00:12:16 23.399999619
2015-10-12 00:17:16 23.399999619
2015-10-12 00:22:17 23.399999619
2015-10-12 00:27:17 23.399999619
2015-10-12 00:32:17 23.399999619
2015-10-12 00:37:17 23.399999619
...
The text file contains a few hundred lines up to several thousands. I've tested five alternate solutions on R2016a. I attach the script and the text file. The results with tic/toc and profile are consistent.
&nbsp
The best code, "sscanf", is nearly twice as fast as the "standard".
The FEX-contribution, DateStr2Num by Jan Simon, is really fast. However, I failed to find a fast way to arrange the input data to fit the function. The line,
str = strcat( cac{1}(:,1), repmat({' '},[length(cac{1}(:,2)),1]), cac{1}(:,2) );
ruins the performance. There must be a better way!
Question: Which are the possibilities to increase the speed further?
&nbsp
ADDENDUM 2016-08-31
textsdn_2 (attached) is adapted to runperf. It contains two new cases. The summary result of runperf('textsdn_2.m') is
Name GroupCount mean_MeasuredTime
__________________________ __________ _________________
text2sdn_2/Standard 4 1.0608
text2sdn_2/DateStr2Num 4 0.7491
text2sdn_2/sscanf 4 0.41834
text2sdn_2/fscanf 4 0.59519
text2sdn_2/datetime 4 1.2143
text2sdn_2/DateStr2Num_19c 4 0.3142
text2sdn_2/dtstr2dtnummx 4 1.0475
In production a text file is read once. In these tests the file is read the first time during warmup and is from that point in time available in the system cache.
text2sdn_2/DateStr2Num_19c is three times faster than text2sdn_2/Standard and more than twice as fast as text2sdn_2/DateStr2Num. One reason is that date and time are kept together by using the format '%19c%f'. DateStr2Num doesn't distinguish between tab and space.
cac = textscan( fid, '%19c%f', 'Headerlines',1, 'CollectOutput',true );
text2sdn_2/dtstr2dtnummx is only slightly faster than text2sdn_2/Standard, which is because this test is based on a column of 480 timestamps. With a single timestamp the relative difference is much larger.

采纳的回答

Yair Altman
Yair Altman 2016-8-30
@Per - try to use dtstr2dtnummx(), as explained here: http://undocumentedmatlab.com/blog/datenum-performance
Yair Altman

更多回答(0 个)

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by