How to avoid double counted hours for a given day

3 次查看(过去 30 天)
For a given month (say November), I have hourly data for each day. The problem is, sometimes they have put the same hour twice on the hourly column. For an example 0 1 2 2 4 5 6 7 7 9...etc for a given day. That I have to correct to 0 1 2 3 4 5 6 7 8 9... Also there may be some missing hours during the month. I need to identify those as well. Hours are given in UTC. I want to convert in to local time, and if possible pay attention to change of day light savings. We are in UTC-6 till day light savings end and UTC-7 afterwards.
Appreciate if someone could help me.
Thanks.
  3 个评论
Thishan Dharshana Karandana Gamalathge
see the attached Excel file. 1st column - year 2nd - month 3rd - day 4th - hour 5th - minute
dpb
dpb 2017-7-21
编辑:dpb 2017-7-21
So have you tried any of the suggestions posited? What has been the result(s)?
And, I'll note the "missing/duplicate" hours is exactly what I suspected--
>> data(1:20,:)
ans =
2010 11 4 0 0
2010 11 4 1 0
2010 11 4 1 59
2010 11 4 3 0
...
There's 1:00 and 1:59 then 3:00 so the 2 o'clock hour happened just before the hour occurred. So, what do you want--the actual time or round to nearest hour and trash the minute values?

请先登录,再进行评论。

回答(1 个)

dpb
dpb 2017-7-21
编辑:dpb 2017-7-21
Convert your time data columns to datetime values with datetime. It has ability to use a given time zone and output UTC.
Once you've done that(*), you can use relational tests to find any duplicated values and eliminate those rows as well as any locations with differences other than one hour and fill those in.
As Andrei indicates, exact code would be simpler to write for your particular case if had the full form of the data, but those are the basics and datetime should handle it in a pretty straightforward manner.
() Excepting for the days which shift to/from DST the duplicate values could be eliminated by |*unique| for the hours; it's possible for those days to have duplicate values at the witching hour.
OTOH, if the duplicates are from the fact it's a case where the data collection process is on a polling cycle and running at "just approximately" an hour, perhaps you've got data at 2:01 and then again at 2:58 and then not until 4:02 which explains both the "duplicate" values in hour 2 and the missing hour 3 but there's really no missing data at all. In that scenario you wouldn't want to add or subtract anything but instead you'd want to round the observed actual time to the nearest hour.
ADDENDUM
As noted above, there are no missing values, simply not sampled exactly at the hour every hour. datetime is your friend...
>> data=xlsread('workbook1-test.xls'); % read the data
>> da=datetime([data zeros(length(data),1)]); % convert to datetime
>> da=da(1:10) % show first 10 for brevity
da =
04-Nov-2010 00:00:00
04-Nov-2010 01:00:00
04-Nov-2010 01:59:00
04-Nov-2010 03:00:00
04-Nov-2010 04:00:00
04-Nov-2010 04:59:00
04-Nov-2010 06:00:00
04-Nov-2010 07:00:00
04-Nov-2010 07:59:00
04-Nov-2010 09:00:00
>> da=dateshift(da,'start','hour','nearest') % fixup by rounding nearest hour
da =
04-Nov-2010 00:00:00
04-Nov-2010 01:00:00
04-Nov-2010 02:00:00
04-Nov-2010 03:00:00
04-Nov-2010 04:00:00
04-Nov-2010 05:00:00
04-Nov-2010 06:00:00
04-Nov-2010 07:00:00
04-Nov-2010 08:00:00
04-Nov-2010 09:00:00
>>
Done.
  4 个评论
dpb
dpb 2017-7-22
编辑:dpb 2017-7-22
If you're still stressed over the above non-issue of missing/duplicate hours and trying to mess with the data as integers instead of datetime arrays as shown, then carrying on from the stopping point above
>> hour(da(1:10))
ans =
0
1
2
3
4
5
6
7
8
9
>>
the hour associated with the given date/time value for the first 10 entries...the rest will roll over with the day to show the hour of day.
If you want accumulative, then you want to convert to a duration array by subtracting the first element.
Read the doc on datetime and the "How to" section on handling time in Matlab starting at <<Matlab date-and-time operations>
dpb
dpb 2017-7-23
On checking after the other ?, I see I didn't actually post the duration--I thought I had done:
>> du=da-da(1)
du =
00:00:00
01:00:00
...
08:00:00
09:00:00
10:00:00
11:00:00
12:00:00
13:00:00
...
22:00:00
23:00:00
24:00:00
25:00:00
26:00:00
27:00:00
...
40:00:00
41:00:00
>>
where I've elided bunches in middle for brevity...there's only so much interest in a sequence of numbers. :) The point being that duration counts accumlative time while datetime values keep everything from happening at once and keep "when" as well.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Dates and Time 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by