Using "unique" to identify unique values AND number of occurrences of each unique value

60 次查看(过去 30 天)

Below is the head entries of a table
head(hits)
ID res1 score
_____________ ____ _______
AGAP001076-RD 282 0.67229
AGAP001076-RD 285 0.75292
AGAP001076-RD 286 0.66957
AGAP001076-RD 296 0.51694
AGAP001076-RD 298 0.51655
AGAP001076-RD 310 0.54564
AGAP001076-RD 314 0.74495
AGAP010077-RA 349 0.52136
Using "unique" I can obtain unique IDs. I would also like to obtain the number of occurences of each unique ID, e.g AGAP001076-RD 6
Thank you for your attention

采纳的回答

Steven Lord
Steven Lord 2024-9-19,16:35
Use the groupcounts function.
A = {'AGAP001076-RD' 282 0.67229
'AGAP001076-RD' 285 0.75292
'AGAP001076-RD' 286 0.66957
'AGAP001076-RD' 296 0.51694
'AGAP001076-RD' 298 0.51655
'AGAP001076-RD' 310 0.54564
'AGAP001076-RD' 314 0.74495
'AGAP010077-RA' 349 0.52136};
[counts, groupID] = groupcounts(A(:, 1))
counts = 2×1
7 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
groupID = 2x1 cell array
{'AGAP001076-RD'} {'AGAP010077-RA'}
  3 个评论
Steven Lord
Steven Lord 2024-9-19,19:32
A = {'AGAP001076-RD' 282 0.67229
'AGAP001076-RD' 285 0.75292
'AGAP001076-RD' 286 0.66957
'AGAP001076-RD' 296 0.51694
'AGAP001076-RD' 298 0.51655
'AGAP001076-RD' 310 0.54564
'AGAP001076-RD' 314 0.74495
'AGAP010077-RA' 349 0.52136};
T = cell2table(A)
T = 8x3 table
A1 A2 A3 _________________ ___ _______ {'AGAP001076-RD'} 282 0.67229 {'AGAP001076-RD'} 285 0.75292 {'AGAP001076-RD'} 286 0.66957 {'AGAP001076-RD'} 296 0.51694 {'AGAP001076-RD'} 298 0.51655 {'AGAP001076-RD'} 310 0.54564 {'AGAP001076-RD'} 314 0.74495 {'AGAP010077-RA'} 349 0.52136
If your data is in a table array like the one I created above, you just have to tell groupcounts which variable(s) in the table is/are the grouping variable(s).
countsAndID = groupcounts(T, 'A1')
countsAndID = 2x3 table
A1 GroupCount Percent _________________ __________ _______ {'AGAP001076-RD'} 7 87.5 {'AGAP010077-RA'} 1 12.5
You can use multiple grouping variables as well. Let's make some data with duplicate rows and replace the values in A2 with ones more likely to cause a collision in the combination of the grouping variables A1 and A2.
T2 = T(randi(height(T), 20, 1), :);
T2.A2 = randi(5, 20, 1)
T2 = 20x3 table
A1 A2 A3 _________________ __ _______ {'AGAP001076-RD'} 2 0.51655 {'AGAP001076-RD'} 4 0.74495 {'AGAP001076-RD'} 1 0.75292 {'AGAP001076-RD'} 4 0.51655 {'AGAP001076-RD'} 5 0.54564 {'AGAP001076-RD'} 5 0.66957 {'AGAP001076-RD'} 5 0.51694 {'AGAP010077-RA'} 2 0.52136 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 3 0.75292 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 4 0.74495 {'AGAP001076-RD'} 4 0.51655 {'AGAP001076-RD'} 4 0.51694 {'AGAP001076-RD'} 4 0.51694 {'AGAP001076-RD'} 2 0.67229
countsAndID = groupcounts(T2, ["A1", "A2"])
countsAndID = 6x4 table
A1 A2 GroupCount Percent _________________ __ __________ _______ {'AGAP001076-RD'} 1 4 20 {'AGAP001076-RD'} 2 2 10 {'AGAP001076-RD'} 3 1 5 {'AGAP001076-RD'} 4 8 40 {'AGAP001076-RD'} 5 4 20 {'AGAP010077-RA'} 2 1 5
Let's check. How many rows of T2 have the same A1 and A2 values as the first row of the countsAndID table?
matchesForFirstRowA1 = matches(T2.A1, countsAndID{1, "A1"});
matchesForFirstRowA2 = T2.A2 == countsAndID{1, "A2"};
result = T2(matchesForFirstRowA1 & matchesForFirstRowA2, :)
result = 4x3 table
A1 A2 A3 _________________ __ _______ {'AGAP001076-RD'} 1 0.75292 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 1 0.67229 {'AGAP001076-RD'} 1 0.51655
Does that match the count that groupcount returned in that first row of countsAndID?
isequal(height(result), countsAndID{1, "GroupCount"})
ans = logical
1

请先登录,再进行评论。

更多回答(1 个)

Animesh
Animesh 2024-9-19,16:14
In MATLAB, you can use the "unique" function along with the "histcounts" function to find the number of occurrences of each unique ID in your table. Here's how you can do it:
% Assume 'hits' is your table
% Extract the 'ID' column from the table
ids = hits.ID;
% Find unique IDs and their indices
[uniqueIDs, ~, idx] = unique(ids);
% Count the occurrences of each unique ID
occurrences = histcounts(idx, 1:max(idx)+1);
% Display the results
for i = 1:length(uniqueIDs)
fprintf('%s %d\n', uniqueIDs{i}, occurrences(i));
end
You can refer the following MathWorks documentation for more information on "histcounts" function:

类别

Help CenterFile Exchange 中查找有关 Tables 的更多信息

产品


版本

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by