How do I assign rows of a variable to categories?

5 次查看(过去 30 天)
Hello,
i have a table ("data") that consists of 4 variables (688 rows), this is how the upper 6 rows look like:
Pseudonym Indication Study-name Sequence
Patient_001 1 1 1
Patient_002 2 2 2
Patient_003 3 3 1
Patient_004 3 1 1
Patient_005 4 2 2
Patient_006 4 5 2
I want to find all groups defined by "Indication" "Study-name" "Sequence".
I created a new table: data1 = data(:,{'indication' 'study_name' 'sequence'}) and then used
[p,v] = findgroups(data1) to find all possible groups.
Now I want to assign each row in "Pseudonym" to one of these groups.
My goal is to create a new variable for every group, containing all Pseudonyms that belong to that group.
In the next step i want to randomly pick pseudonyms from each group.
Furthermore I would like to take the group-size (e.g. number of pseudonyms in one group) into consideration.
That means, that if I want to randomly pick 20 Patients from all categories and one group contains 50% of the data, then 10 patients should be picked out of this group.
could you please help me setting up the code!
Thank you so much!
Max

回答(1 个)

Vatsal
Vatsal 2023-9-29
I understand that you have a table “data” which consists of four columns, and you want to find the groups based on the columns "Indication", "Study-name" and "Sequence". After finding the groups you want to assign each row in “Pseudonym” to one of these groups.
After this, it is required to randomly pick “x” number of “Pseudonym” from all groups, keeping the group size in consideration.
I am attaching the code below which will randomly pick the “Pseudonym” from all groups while considering the group-size:
data1 = data(:, {'Indication', 'Study-name', 'Sequence'});
[p, v] = findgroups(data1);
groups = splitapply(@(x) {x}, data.Pseudonym, p);
numPicks = 20; % Number of pseudonyms to pick in total
pickedPseudonyms = [];
totalPseudonyms = sum(cellfun(@numel, groups));
scalingFactor = numPicks / totalPseudonyms;
[~, sortedIndices] = sort(cellfun(@numel, groups), 'descend');
sortedGroups = groups(sortedIndices);
for i = 1:numel(sortedGroups)
groupSize = numel(sortedGroups{i});
picksFromGroup = round(groupSize * scalingFactor); % Adjust picks based on group size
if picksFromGroup > 0
randomIndices = randperm(groupSize, min(groupSize, picksFromGroup));
pickedPseudonyms = [pickedPseudonyms, sortedGroups{i}(randomIndices)];
end
% Break the loop if 20 pseudonyms are selected
if numel(pickedPseudonyms) >= numPicks
break;
end
end
You can also refer to the MATLAB documentation for "randperm" to obtain more information on its usage and syntax. The link is provided below: -
I hope this helps!

类别

Help CenterFile Exchange 中查找有关 Categorical Arrays 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by