60 years of daily data; need to group separately by month and year
10 次查看(过去 30 天)
显示 更早的评论
I am completely new to MATLAB and have about 60 years of daily data (22246 rows) and four columns of interest.
I've imported the data into MATLAB as a table from a .csv.
I want to start by grouping this into months so that I can then analyse averages, minimums, maximums, general trends, plot graphs etc. but I don't want to add up all monthly data into a single value for each month for each column. Instead, I want to see a single value for each month of every year (so condensing this into about 720 rows), not 12 rows where all January data is aggregated and then all February etc.
I.e.
J F M A M J J A S O N D
2000
2001
2002
...
I know that this cannot go into the original table and that a new one will need to be created.
[Later on, I want to do this by year but once I know how to do it by month, I should be able to amend the code accordingly.]
Another problem is that the data starts mid-year and ends mid-year (i.e. start date is not 1st Jan and end date is not 31st Dec), although the data does run consecutively without a break once it starts.
The original dates are in the form DD-MMM-YYYY.
I already know how to do the analysis (e.g. averages) on the data but my problem is getting to a starting point in terms of the values that I want to analyse.
I've tried searching forums already and can't find the answer or at least one that my limited knowledge of MATLAB can understand. Please help as I'm tearing my hair out!
Thank you!
3 个评论
Stephen23
2023-2-26
编辑:Stephen23
2023-2-26
Do NOT solve this with loops, lots of UNIQUE calls, padding arrays with ZEROS and the like.
Learn to use MATLAB instead of fighting it. The most important thing is to get the data design right, which does take some practice. But in many cases, a good place to start is to import data as a table:
You can solve this task in just a few table commands: the more you browse the documentation, the more you will find and know what is available to help you solve your problems:
采纳的回答
Star Strider
2023-2-26
One approach for the mean values —
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1307475/Flows.csv')
TT1 = table2timetable(T1)
TT1m = retime(TT1, 'monthly', 'mean') % Aggregate On 'mean' Values
VN = T1.Properties.VariableNames;
T1maDate1 = datetime('01-Jan-1958') + calmonths(0:8).'; % Pad First Nine Months
T1ma1 = zeros(size(T1maDate1,1),4);
T1ma1 = [table(T1maDate1) array2table(T1ma1)];
T1ma1.Properties.VariableNames = VN;
T1maDate2 = TT1m.Date(end) + calmonths(1:4).'; % PAd Last Four Months
T1ma2 = zeros(size(T1maDate2,1),4);
T1ma2 = [table(T1maDate2) array2table(T1ma2)];
T1ma2.Properties.VariableNames = VN;
TT1m = [table2timetable(T1ma1); TT1m; table2timetable(T1ma2)] % Pad Array To Fill Out Months
Monthsc = unique(month(TT1m.Date, 'shortname'),'stable');
Yearsc = unique(year(TT1m.Date),'stable');
for k = 1:size(TT1m,2)
T1var{k,1} = array2table(reshape(TT1m{:,k}, 12, []).', 'RowNames',string(Yearsc), 'VariableNames',Monthsc);
T1var{k,2} = VN{k+1};
end
T1var{1,:}
Repeat this for the other statistics.
.
2 个评论
Star Strider
2023-2-26
As always, my pleasure!
I’ve had some experience with this sort of problem recently with another thread, so I know that there do not appear to be existing functions that can do this sort of operation. (It would be nice if there were!) There might be other ways of preallocating the table to avoid padding both ends of it, however that was not immediately obvious either. The matrix approach seems to work best here.
.
更多回答(2 个)
Image Analyst
2023-2-26
First see if you can do it yourself using one of these functions: splitapply, grpstats, groupsummary
If you still can't figure it out, let us know if you have the stats toolbox, and someone will do it for you.
Stephen23
2023-2-26
编辑:Stephen23
2023-2-27
The simple MATLAB approach using GROUPSUMMARY and UNSTACK:
M = categorical(datetime(1,1,1,"Format","MMM"):calmonths(1):datetime(1,12,31));
T = readtable('Flows.csv');
T.Year = T.Date.Year;
T.Month = M(T.Date.Month).';
S = groupsummary(T,["Year","Month"],"mean","TotalDailyFlowRate_m3s");
U = unstack(S,"mean_TotalDailyFlowRate_m3s","Month", "GroupingVariables","Year")
4 个评论
Stephen23
2023-2-27
编辑:Stephen23
2023-2-27
@Star Strider: thank you! For some reason I didn't think of a non-scalar input to CALMONTHS.
M = categorical(datetime(1,1,1,"Format","MMM")+calmonths(0:11))
sort(M) % sorts into category order (not alphabetic order)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!