Sort data by day of year
4 次查看(过去 30 天)
显示 更早的评论
Hello,
I have a set of data with 12,000+ rows (and a number of columns, but I don't think is related to my question). Each row represents a day of data for some variable (starting on January 1 of some year, say 1960, the second row would be January 2nd, and so forth; it does this for over 30 years of data). So, the data might look something like:
- 25 (this would correspond to January 1 of the first year)
- 32 (this would correspond to January 2)
- 44 (this would correspond to January 3)
- 11
- .
- .
- .
- 29 (this would be the last day of data in the final year of the whole set)
What I need to do is sort this data so that I have all the January 1 data grouped together, all the Jan 2 data, etc (for all 365 days of the year). My goal is to take the average of all Jan 1 data, all Jan 2 data, and so forth, again, doing this for 365 days of the year for the whole dataset (30+ years). I should account for leap years, although it doesn't particularly matter which year was a leap year.
Any suggestion?
3 个评论
回答(3 个)
Walter Roberson
2012-1-25
"the cyclist"'s questions are good ones.
Once you have a vector that corresponds the day number of the year for each line, with the day number being an integer from 1 onward, then you can use
accumarray(DayNumber(:), YourData(:,TheAppropriateColumn))
The default operation of accumarray is to find the mean.
You will, as you say, have to account for leap years, but I will presume here that you fold all necessary logic for that in to the calculation of the day number. All entries with matching day number will be averaged in the above call.
0 个评论
Andrei Bobrov
2012-1-25
EDIT 2 [14:16(UTC+4) 26.01.2012]
eg:
a = randi([11 44],1600,1);
k =datevec(datenum('01-Jan_1960')+(0:numel(a)-1).');
id = strfind(k(:,3)',[29 1]);
c = 366*ones(numel(a),1);
idx = setdiff(1:numel(a),id);
c(idx) = rem((1:numel(idx))'-1,365)+1;
out = accumarray(c,a,[],@mean);
0 个评论
the cyclist
2012-1-25
An extremely pedantic, non-clever solution, assuming you have to infer the dates, but know the starting date:
MONTHSPERYEAR = 12;
DAYSPERMONTH = 31;
START_YEAR = 1960;
START_MONTH = 1;
START_DAY = 1;
NUMBER_ROWS = 12000;
randomData = rand(NUMBER_ROWS,1);
startDateNumber = datenum(START_YEAR,START_MONTH,START_DAY);
dateNumbers = startDateNumber + (0:NUMBER_ROWS-1);
dateVectors = datevec(dateNumbers);
indexToYear = 1;
indexToMonth = 2;
indexToDay = 3;
thisDayOfTheYearData = cell(MONTHSPERYEAR,DAYSPERMONTH);
for nm = 1:MONTHSPERYEAR
for nd = 1:DAYSPERMONTH
indexToThisDate = (dateVectors(:,indexToMonth)==nm) & (dateVectors(:,indexToDay)==nd);
thisDayOfTheYearData{nm,nd} = randomData(indexToThisDate);
end
end
2 个评论
the cyclist
2012-1-25
This pulls all the daily data out into its own element of a cell array. You can then do any operation (e.g. the mean) on those data.
Jake_K
2018-2-22
Hi Cyclist.
I have a question since I'm not really looking through. What do I have to add to your code if I want to calculate for every month the difference of the first and last value of the month.
E.g. for January 2015 I measured 5°C on the 1/1/2015 and 13°C on the 31/1/2015. So I get back 8°C. So I can finally write into a matrix the differences. Something like this:
1/1/2015 8
1/2/2015 3
1/3/2015 5
And so on. I would be glad if you could help me. Thanks a lot!
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Dates and Time 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!