Processing a HUGE number of timestamps
4 次查看(过去 30 天)
显示 更早的评论
I have a cell array whose elements are time stamps in the format "Mon Apr 01 20:00:00 BST 2013". I have a very large number of these vectors. At the moment, I loop through each value in the vector and apply the below function. This loop taking up 99% of my processing time.
How can I remove the loop?
thanks
function myTimeOut = st_timestampConvert(myTime)
year = strtrim(myTime(end-4:end));
month = strtrim(myTime(5:8));
day = strtrim(myTime(9:10));
time = strtrim(myTime(11:19));
timezone = strtrim(myTime(20:23));
myTimeOut = convert_to_UTC(myTimeOut, timezone); %time zone conversion
myTimeOut = datenum([day '-' month '-' year ' ' time], 'dd-mmm-yyyy HH:MM:SS');
end
0 个评论
采纳的回答
Guillaume
2014-11-14
Without R2014b datetime, you can use regexprep to rearrange the bits of the string you want before calling datenum. It's many orders of magnitude faster than a loop, cellfun, or strsplit.
s2 = regexprep(s, '(\w+) (\w+) (\d+) ([0-9:]+) (\w+) (\d+)', '$3-$2-$6 $4');
tout = datenum(s2, 'dd-mmm-yyyy HH:MM:SS');
On my machine, to process 100k dates, the above two lines takes 2.1 seconds , most of it taken by the datenum operation. The regexp line is only 0.3 seconds.
There remains the problem of the time zone adjustment (which I believe should have come after the conversion to datenum in your example). Your convert_to_UTC is not part of matlab. Hopefully it can operate on cell arrays as well. Thus to extract the timezone:
tzones = regexp(s, '\w+(?= \d+$)', 'match', 'once');
tout = convert_to_UTC(tout, tzones); %Will this work?
0 个评论
更多回答(2 个)
Peter Perkins
2014-11-13
none, I don't know if you have access to R2014b. If you do, consider using the new datetime data type. On a not so fast PC, parsing 100000 strings like yours, with time zones, takes a bit over 2s. 'BST' presents a potential issue, because it might mean any number of things. In the UK, it means "British Summer Time", and the following parses the strings using that locale. Hope this helps.
% construct 100k strings
d = datetime(2013,4,1,20,0,0,'TimeZone','Europe/London') + days(randn(100000,1));
s = cellstr(d,'eee MMM dd HH:mm:ss z yyyy','en_UK');
ans =
Tue Apr 02 14:33:32 BST 2013
% parse those strings
tic
d1 = datetime(s,'Format','eee MMM dd HH:mm:ss z yyyy','TimeZone', ...
'Europe/London','Locale','en_UK');
toc
Jan
2014-11-16
I prefer Guillaume's version, but it is not "magnitudes" faster than a loop approach:
function DOut = ConvertCellDate(DIn)
DOut = zeros(size(DIn));
for k = 1:numel(DIn)
Dx = double(DIn{k} - '0'); % For faster conversion of numbers
month = (strfind('JanFebMarAprMayJunJulAugSepOctNovDec', DIn{k}(5:7)) + 2) / 3;
year = Dx(25) * 1000 + Dx(26) * 100 + Dx(27) * 10 + Dx(28);
DOut(k) = datenummx(year, month, Dx(9) * 10 + Dx(10), ...
Dx(12) * 10 + Dx(13), Dx(15) * 10 + Dx(16), ...
Dx(18) * 10 + Dx(19));
end
Phew, this looks cruel and is not smart to debug. But it takes 2.3 sec on my Matlab 2011b/Win7/32 system, while Guillaume's method takes 1.7 sec.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Dates and Time 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!