Find and reduce a numeric array with identical columns
3 次查看(过去 30 天)
显示 更早的评论
Dear Sir/Madam,
I would like to ask you the following question:
I have a data file like this
tmp = [...
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438]
I would like re-group (or reduce) it with following conditions:
For any row, if column 1 AND column 3 of this row is identical with any column 1 AND column 3 of any other row. Then reduce to one new row with new value of column 2, this new value of column 2 is the sum of original values of column 2. Column 1 is kept the same, Column 4 is not important.
So, for above data, I expect to have the answer:
119 5 6832 0.20313 % 5+4-4=5
122 -48 6853 0.29688
125 -8 6853 0.4375 % -12+4=-8
121 12 6914 0.5625
119 48 6914 0.17188
What Matlab command to use? I would greatly appreciate it if you left your code and running output.
I am using MATLAB R2014a.
Thank you very much
3 个评论
Image Analyst
2018-12-30
I was wondering the same thing. Hopefully the order doesn't matter. I'm sure you could write the code afterwards in such a ways that it didn't matter.
采纳的回答
Stephen23
2018-12-30
>> [~,X,Y] = unique(tmp(:,[1,3]),'rows');
>> out = tmp(X,:);
>> out(:,2) = accumarray(Y,tmp(:,2),[],@sum)
out =
119.000000 5.000000 6832.000000 0.023438
119.000000 48.000000 6914.000000 0.171880
121.000000 12.000000 6914.000000 0.562500
122.000000 -48.000000 6853.000000 0.296880
125.000000 -8.000000 6853.000000 0.437500
7 个评论
更多回答(1 个)
Image Analyst
2018-12-30
编辑:Image Analyst
2018-12-30
What about using grpstats(), if you have the Statistics and Machine Learning Toolbox.
tmp = [...
121 12 6914 0.5625
122 -48 6853 0.29688
119 48 6914 0.17188
125 -12 6853 0.078125
125 4 6853 0.4375
119 5 6832 0.20313
119 4 6832 0.039063
119 -4 6832 0.023438]
col5 = 10000*tmp(:, 1) + tmp(:, 3)
tmp = [tmp, col5];
% No sum in grpstats, so have to do it twice.
% Once to get the mean and once to get the count.
outputMean = grpstats(tmp, tmp(:, 5), 'mean')
outputNumel = grpstats(tmp, tmp(:, 5), 'numel')
% Crop off temporary 5th column
output = outputMean(:, 1:4) % Initialize
% Column 2 is the sum = mean * count
output(:, 2) = outputMean(:, 2) .* outputNumel(:, 2)
The output seems to be sorted by the first column though:
output =
119 5 6832 0.088544
119 48 6914 0.17188
121 12 6914 0.5625
122 -48 6853 0.29688
125 -8 6853 0.25781
That might be a problem for you. I'm not sure. Of course column 4 can be cropped off or ignored since you say it's not important.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!