compare groups of items regarding overlaps

7 次查看(过去 30 天)
Short background: I have a number of texts that are being grouped regarding their value (about 5 differing values for each variable) for number of variables; meaning that each texts appears in one value group of each variable. (group A might be text1, text7, text23, text38; etc.)
Goal: I want to compare each of these primary groups regarding any overlap of their contained items using one group as a basis; i.e. I take group A and check which texts of this group also appear in any group of another variable (of course, I am not comparing groups that belong to the same variable, since there would oviously be no overlap). In the end, I'd like to say that e.g. Text 1, 7, 23 and 38 all appear in groups A, F, J, K and so forth.
That means I do not want to compare the means or any values of the data groups, but want to know which groups share which items.
Since I am not yet that experienced yet, I can't seem to find the right code to start with; any ideas about how to tackle this task?
  3 个评论
Image Analyst
Image Analyst 2021-6-23
What do you mean by overlapping texts? What kind of data do you have? String arrays? Character arrays? Images? Tables? Cell arrays? Structure arrays? Can you attach your data (group(s)) in a .mat file with the paper clip icon.
save('answers.mat', 'group1', 'group2', 'group3');
Use your actual variable names of course.
In the meantime, see functions like setdiff(), intersect(), contains(), ismember(), strcmpi(), etc.
Ulrike Lohner
Ulrike Lohner 2021-6-24
Unfortunately, I am not allowed to post any original data due to data security issues (and the code I have so far is importing the data, so that wouldn't be any help). I can try to be more specific regarding my data, though:
Basically I have a large number of groups of strings that are organized in a table (each column one group, each string in a cell); there are about 150 different strings in total and each string will appear in a number of groups; however, no group is composed of the same combination of strings, and additionally, the groups do not have the same sizes.
I will probably need a loop that takes each column (i.e. each group) as a starting point once, checking which strings of this group is also contained in the other groups; giving me as output a new set of string clusters that only contain those strings included in the first group.
Anyway: thank you for the suggestions so far; I will dig deeper into the functions you mentioned already and will check if one of them serves my purpose.

请先登录,再进行评论。

采纳的回答

SALAH ALRABEEI
SALAH ALRABEEI 2021-6-23
Use
[val,ndxA,ndxB] = intersect(A,B)
It will give you the overlapping val and its index in both groups A and B
  1 个评论
Ulrike Lohner
Ulrike Lohner 2021-6-24
Thank you for this suggestion! I will have a closer look at that function and check whether is serves the right prupose.

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Startup and Shutdown 的更多信息

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by