Calculate avergae values per hour, day, month and year

5 次查看（过去 30 天）

显示更早的评论

Robert 2016-12-20

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/317655-calculate-avergae-values-per-hour-day-month-and-year

编辑： Lucas Guimaraes 2021-4-1

采纳的回答： Andrei Bobrov

在 MATLAB Online 中打开

I have ~23 years of hourly data in a large matrix (5 columns and over 5 millions rows), like this:

 YEAR / MONTH / DAY / HOUR / DATA
3 7 4 25.786
3 7 4 25.686
3 7 5 25.746
3 7 6 25.686
3 7 6 24.786
3 7 6 25.686
3 7 7 26.746
3 7 8 22.686
 ....
10 24 0 27.686
10 24 0 28.746
10 24 1 25.686

Where...

YEAR= 1994:1:2016 (with leap and regular years)

MONTH= 1:12 (during leap and regular years)

DAY= 1:31 (with 28-31 days depending on leap and regular years)

HOUR= 0-23 (0=time between midnight and 1am)

Unfortunately series doesn't start at MONTH 1, DAY 1, HOUR 0, thinking in loop here. Also HOUR values do not have the same time step (some days can have 3 values other days can have 48 values, etc).

Any suggestions on how to obtain the data average at: 1) each hour (per day per month per year), 2) each day (per month per year), and 3) each month (per year).

I am also interested on how to calculate the data average per: 1) year (23 years), 2) month (12 months), 3) day (366 days), and 4) hour (24 hours).

Thank you for your suggestions.

采纳的回答

Andrei Bobrov 2016-12-20

5
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/317655-calculate-avergae-values-per-hour-day-month-and-year#answer_247988

编辑：Andrei Bobrov 2016-12-21

在 MATLAB Online 中打开

Let data - your data.

%avergae values per hour
[ah,~,ch] = unique(data(:,1:4),'rows');
out_hour = [ah,accumarray(ch,data(:,5),[],@nanmean)];
%avergae values per day
[ad,~,cd] = unique(data(:,1:3),'rows');
out_day = [ad,accumarray(cd,data(:,5),[],@nanmean)];
%avergae values per month
[am,~,cm] = unique(data(:,1:2),'rows');
out_month = [am,accumarray(cm,data(:,5),[],@nanmean)];
%avergae values per year
[ay,~,cy] = unique(data(:,1:2),'rows');
out_year = [ay,accumarray(cy,data(:,5),[],@nanmean)];

If you have your data in a table or a timetable I recommend using Sean de Wolski's approach below. If you have a table you'll need to convert it into a timetable first using the table2timetable function as retime is only defined for timetable arrays.

Lucas Guimaraes 2021-4-1

编辑：Lucas Guimaraes 2021-4-1

Hello,

thank you for that. Helped me too in my case.

But help me again, please haha. If I have to calculate the standard deviations of these data. How do I do?

thank you!

Lucas

请先登录，再进行评论。

更多回答（1 个）

Sean de Wolski 2016-12-20

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/317655-calculate-avergae-values-per-hour-day-month-and-year#answer_247990

编辑：Sean de Wolski 2016-12-20

在 MATLAB Online 中打开

This is the exact purpose of retime with the timetable class in R2016b.

% Your data 
D = ...
 [1994 3 7 4 25.786
 1994 3 7 4 25.686
 1994 3 7 5 25.746
 1994 3 7 6 25.686
 1994 3 7 6 24.786
 1994 3 7 6 25.686
 1994 3 7 7 26.746
 1994 3 7 8 22.686
 2016 10 24 0 27.686
 2016 10 24 0 28.746
 2016 10 24 1 25.686];
% Make Datetime
dt = datetime(D(:,1),D(:,2),D(:,3),D(:,4),0,0);
% Make timetable
tt = timetable(dt,D(:,end),'VariableNames',{'Data'})
%%Retiming
% Monthly
rmmissing(retime(tt,'monthly',@mean))
% Yearly
rmmissing(retime(tt,'yearly',@mean))

You can pass whatever function you want in instead of @mean.

4 个评论
显示 2更早的评论隐藏 2更早的评论

Robert 2017-3-8

Thank you Sean for your help!

Wondering if it is possible to calculate the mean of multiple columns using retime (assuming D has multiple columns with data)? Similar question that I asked Kelly above.

Very appreciated!

Robert 2017-3-21

在 MATLAB Online 中打开

Hi Sean,

Following your last suggestion, I would like to ask you another question related to timetable if that is OK with you.

I am running this…

 % Make Datetime and timetable
 T1 = timetable(datetime(DATA(:,1), 'ConvertFrom',  'datenum'),DATA(:,2:4),… 'VariableNames',{'Data'});
 % Then calculating daily mean
 daily_mean=rmmissing(retime(T1,'daily',@(x)nanmean(x(:))));

The above works great!

However, this only helps me when I need to calculate the mean of few DATA columns. To better understand my problem, DATA is just a matrix with over 200,000 rows, 1st column (serial date number) and column 2 to end (numerical double values), for variables (and sub-variables), like this example…

 Variable 1 = DATA Column 2-3 (Var1_01 Var1_02)
 Variable 2 = DATA Column 4:8 (Var2_01 Var2_02 Var2_03 Var2_04)
 Variable 3 = DATA Column 9-11 (Var3_01 Var3_02 Var3_03)
 Variable 4 = DATA Column 12-18 (Var4_01 Var4_02 Var4_03 Var4_04 Var4_05 Var4_06 Var4_07)

…As I do have over 100 variables with different # of sub-variables (over 1000 columns), I created (from T1) a 1×52 cell array that contains ONLY the names of UNIQUE variables (like this … Columns 1 through 5 … 'Var1' 'Var2' 'Var3' 'Var4 'Var5' ….) associated to columns (2 to 6 in T1, as column 1 is Time), which could help to sort in the loop the search for specific Variable 1, and then run the mean, save it in a matrix/table, and then repeat the search for the following columns with Variable 2, and so on...

My question is how I can generate that loop to calculate the mean of each variable (using the above daily mean timetable solution).

Thank you for your help and comments!

请先登录，再进行评论。

请先登录，再回答此问题。