Increase speed to read text file and parse date time data.
2 次查看(过去 30 天)
显示 更早的评论
I'm searching a faster way to read this file and convert the date and time to serial date number.
Datum Tid Värde
2015-10-12 00:02:16 23.399999619
2015-10-12 00:07:16 23.399999619
2015-10-12 00:12:16 23.399999619
2015-10-12 00:17:16 23.399999619
2015-10-12 00:22:17 23.399999619
2015-10-12 00:27:17 23.399999619
2015-10-12 00:32:17 23.399999619
2015-10-12 00:37:17 23.399999619
...
The text file contains a few hundred lines up to several thousands. I've tested five alternate solutions on R2016a. I attach the script and the text file. The results with tic/toc and profile are consistent.

 
The best code, "sscanf", is nearly twice as fast as the "standard".
The FEX-contribution, DateStr2Num by Jan Simon, is really fast. However, I failed to find a fast way to arrange the input data to fit the function. The line,
str = strcat( cac{1}(:,1), repmat({' '},[length(cac{1}(:,2)),1]), cac{1}(:,2) );
ruins the performance. There must be a better way!
Question: Which are the possibilities to increase the speed further?
 
ADDENDUM 2016-08-31
textsdn_2 (attached) is adapted to runperf. It contains two new cases. The summary result of runperf('textsdn_2.m') is
Name GroupCount mean_MeasuredTime
__________________________ __________ _________________
text2sdn_2/Standard 4 1.0608
text2sdn_2/DateStr2Num 4 0.7491
text2sdn_2/sscanf 4 0.41834
text2sdn_2/fscanf 4 0.59519
text2sdn_2/datetime 4 1.2143
text2sdn_2/DateStr2Num_19c 4 0.3142
text2sdn_2/dtstr2dtnummx 4 1.0475
In production a text file is read once. In these tests the file is read the first time during warmup and is from that point in time available in the system cache.
text2sdn_2/DateStr2Num_19c is three times faster than text2sdn_2/Standard and more than twice as fast as text2sdn_2/DateStr2Num. One reason is that date and time are kept together by using the format '%19c%f'. DateStr2Num doesn't distinguish between tab and space.
cac = textscan( fid, '%19c%f', 'Headerlines',1, 'CollectOutput',true );
text2sdn_2/dtstr2dtnummx is only slightly faster than text2sdn_2/Standard, which is because this test is based on a column of 480 timestamps. With a single timestamp the relative difference is much larger.
0 个评论
采纳的回答
Yair Altman
2016-8-30
@Per - try to use dtstr2dtnummx(), as explained here: http://undocumentedmatlab.com/blog/datenum-performance
Yair Altman
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Spreadsheets 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!