Group mean for repeated samples
显示 更早的评论
Have a looping sample with index 1-5 through time, I need averages for each period and index (not simple group-wise mean of id 1-5). Complicated by occasional non-even sample numbers, i.e. usually there are 2 data points for each id, but occasionally 3 and also by the hh and mm time stamp not changing in step with id, i.e. the same minute could exist for several id's. Data like so (actual dataset larger and more complicated):
yyyy mm dd hh mm rec id data1 data2
2015 5 16 12 54 65433 1 402.2262 3.598117
2015 5 16 12 54 65434 1 401.9153 3.596588
2015 5 16 12 54 65435 2 401.9349 3.597093
2015 5 16 12 54 65436 2 402.1344 3.598488
2015 5 16 12 54 65437 3 401.6603 3.599219
2015 5 16 12 55 65438 3 402.3899 3.596839
2015 5 16 13 3 65439 4 401.6629 3.574918
2015 5 16 13 3 65440 4 401.9409 3.575383
2015 5 16 13 3 65441 5 402.3803 3.574625
2015 5 16 13 3 65442 5 402.174 3.571742
2015 5 16 13 24 65443 1 401.9677 3.568859
2015 5 16 3 24 65444 1 401.7614 3.565976
Could use a moving window-type approach:
for i = 1:floor(n/(period)
means1(i) = mean(T1((i*period-(period-1)):(i*period)));
means2(i) = mean(T2((i*period-(period-1)):(i*period)));
...but occasionally there are less than expected numbers in the period.
This is nice solution if I wanted all ID 1 etc means:
[ii,jj] = ndgrid(id,1:size(data,2));
bb = accumarray([ii(:),jj(:)], data(:), [], @mean);
meanout = data - bb(id,:);
So what I want as result is:
id n mean1 mean2
1 2 402.07075 3.5973525
2 2 402.03465 3.5977905
3 2 402.0251 3.598029
4 2 401.8019 3.5751505
5 2 402.27715 3.5731835
1 2 401.86455 3.5674175
Any advice appreciated, could not find info for this specific issue. Some clever use of accumarray will probably work, but I could not conquer it.
2 个评论
dpb
2015-11-3
Think it might help to post a sufficient section of the file to illustrate the problem(s) tried to describe. I, at least, got lost trying to figure out what is in the data and isn't and where the time identifiers referred to came from...oh, now I see essentially the same comment in Cyclist's Answer.
Colin Edgar
2015-11-3
采纳的回答
更多回答(1 个)
the cyclist
2015-11-3
0 个投票
I don't fully understand how to identify the rows that should be grouped, but it seems that if you can figure that out, then you can use accumarray.
One possibility that comes to mind is to use the unique function (with the "rows" argument) on your original data. You identify the ids, times, or whatever that are unique, and should be aggregated.
The third argument of unique is an index from the unique values back to the original, and can be used as the first argument of accumarray, telling it exactly which rows to aggregate.
Make sense? My answer here is an example of the technique. In that example I only needed to look at one column to find the unique values, but the concept would extend to your case where multiple values are required to identify the "key".
类别
在 帮助中心 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!