How do I obtain 8760 mean hourly data points from 10.6 years data?
6 次查看(过去 30 天)
显示 更早的评论
I have data from 01-Jan-2013 to 23- Aug-2023. I have attached the XWinddata.xlsx file. The data comprise hourly measured wind speed, direction, temperature and pressure. I want to find hourly average values for the period so that I have 8760 hourly data corresponding to hours in a year. I have cleaned the data to remove NANs. I failed to obtain the required values. I have followed two similar questions that I came accross on this site but was getting errors. Please may you assist.
Following the codes given I created my code
XWinddata.DoY = day(XWinddata.Timestamp,'dayofyear'); % I follwed the answer in the above link, it worked
XWinddata.HoD = hour(XWinddata.Timestamp); % This also worked
head(XWinddata) % Also worked and the DoY and HoD columns were added to my timetable
t = varfun(@mean,XWinddata,'GroupingVariables',{'DoY' 'HoD'},'OutputFormat','table'); % I got an error and got stuck
Error obtained while running the code
Error in tabular/varfun>@(s,varargin)dfltErrHandler(grouped,funName,s,varargin{:}) (line 201)
errHandler = @(s,varargin) dfltErrHandler(grouped,funName,s,varargin{:});
Error in tabular/varfun (line 280)
outVals{igrp} = errHandler(s,inArg);
I also used groupsummary following answeras given on https://www.mathworks.com/matlabcentral/answers/566793-how-to-obtain-hourly-average-from-several-years-of-data?s_tid=sug_su
I also obtained errors.
Thank you in anticipation.
0 个评论
采纳的回答
dpb
2023-10-2
编辑:dpb
2023-10-2
tt=readtimetable('XWinddata.xlsx');
head(tt,1)
tt=removevars(tt,{'yearhour','DoY'}); % clean so start over
tt=addvars(tt,day(tt.Timestamp,'dayofyear'),hour(tt.Timestamp),'NewVariableNames',{'DoY','HoD'},'After',{'PressurePa'});
ttHr=varfun(@mean,tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');
height(ttHr)
[head(ttHr);tail(ttHr)]
But, you are missing most of the odd hours...
[numel(unique(tt.DoY)) numel(unique(tt.HoD)) numel(unique([tt.DoY tt.HoD],'rows'))/2 366*24]
As the above confirms, there are only 4,486 unique DoY, HoD combinations, not the 8784 that exist in a 366-day year. That's because you have data for most days only on two-hour increments.
This also leads back to the problem that the poster in one of the earlier links had in that there are leap years and they will show up with years that do have 366 days in them and the DOY algorithm will always count them in its conversion, even if you were to remove Feb 29 data from the timetable before computing DoY.
You didn't show us the top level user code error, only the internal error trap so we can't tell for absolute certain what went wrong; I'd guess the problem was you didn't remove the 'yearhour' variable (or use 'InputVariables' named argument pair) to not try to average it as well -- being nonnumeric, you can't take the mean of it.
NOTA BENE: However, it appears as you have only a very limited number of observations for the odd hours; whether to try to interpolate to fill those first or not I'll leave to you to decide...
Says there almost two entries for each time stamp, however, and interpolation only works for unique x values so will have to average first and then fill in..
tt=removevars(tt,{'DoY','HoD'}); % clean to start over
tt=rmmissing(retime(tt,'hourly','mean')); % average remove missing hours
tt=retime(tt,'hourly','linear'); % now fill in odd hours
tt=addvars(tt,day(tt.Timestamp,'dayofyear'),hour(tt.Timestamp),'NewVariableNames',{'DoY','HoD'},'After',{'PressurePa'});
[numel(unique(tt.DoY)) numel(unique(tt.HoD))]
isleapyr=@(t)(eomday(year(t),2)==29); % leap year logical function
isleapday=@(t)((month(t)==2)&(day(t)==29)); % leap day logical function
tt(isleapday(tt.Properties.RowTimes),:)=[]; % now remove Feb29 globally
[numel(unique(tt.DoY)) numel(unique(tt.HoD))] % still 1:366 in DoY
ix=isleapyr(tt.Properties.RowTimes)&(month(tt.Properties.RowTimes)>2); % dates needing fixup
tt.DoY(ix)=tt.DoY(ix)-1; % and adjust to 365 days
[numel(unique(tt.DoY)) numel(unique(tt.HoD))] % now we're back to 365 day year w/ DoY to match
ttHr=varfun(@mean,tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');
height(ttHr)
[head(ttHr);tail(ttHr)]
The above does it by just throwing away Feb 29, you could alternatively perhaps collapse Feb 28-29 to Feb 28 or Feb 29/Mar 1 to Mar 1 to somehow keep the values observed in the mix, but the above is the simplest approach, certainly.
8 个评论
dpb
2023-10-4
Moral: Have to be VERY careful with Excel and dates...<for many reasons>. See @Walter Roberson's comment to the Q? there for another nasty trait.
更多回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!