Find duplicate entries and sum up their associated values then put everything back in a table
34 次查看(过去 30 天)
显示 更早的评论
Hello everyone,
I have an excel sheet with two columns. The first column is a list of names, and the second column is the score of each person. Some names have multiple score and I want to find duplicate entries and sum up their associated values then put everything back in a table. Basically clean up the duplicate but want the sum of the scores. The excel file is attached as well
I would really appreciate if you have a script or a function can achieve the results.
Here where I’m at
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/678503/image.png)
Data = readtable("Data.xlsx")
unique(Data.Names)
[C,ia,idx] = unique(Data(:,1),'stable');
val = accumarray(idx,Data(:,2),[],@sum);
mat = [C val]
0 个评论
采纳的回答
dpb
2021-7-8
tData = readtable("Data.xlsx");
tData.Properties.VariableNames={'Name','Score'}; % set a known set of variable names
tSum=rowfun(@sum,tData,'InputVariables','Score','GrouptingVariables','Name');
writetable(tSum,'PickAnOutputFileName.xls')
更多回答(1 个)
Cris LaPierre
2021-7-8
It gets a bit trickly since you don't want to change the order of the names, but I think you could do this with groupsummary.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
Data.Order = (1:height(Data))'
% Use groupsummary to find sum and min of scores and order for each unique name
sumTbl = groupsummary(Data,"Names",["sum","min"],["Scores" "Order"])
% sort the results, which are alphabetical, so they are back in the original order
sumTbl = sortrows(sumTbl,"min_Order");
% create a new table with just Names and total scores
finalData = sumTbl(:,["Names","sum_Scores"]);
% rename 'sum_Scores' back to 'Scores'
finalData.Properties.VariableNames(2) = "Scores"
3 个评论
dpb
2021-7-8
I didn't think of groupsummary, Chris. Good thinking. Was keeping the order an input requirement? I missed it if it was. I've done same as you before several times; have one humongous spreadsheet that has to get updated piecemeal -- it has a (hidden) column that contains =ROW() precisely for the purpose.
Cris LaPierre
2021-7-8
I had interpreted it as having to be in the same order, but on a re-read, that is not stated. If not, it gets much simpler.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
% Use groupsummary to find sum scores for each unique name
finalData = groupsummary(Data,"Names","sum")
This keeps the GroupCount information, but that's not bad info to have either.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!