Grouping multi-variable data points

Question

Gabriel Stanley 2022-8-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1777380-grouping-multi-variable-data-points

回答： Divyam 2024-9-18

I have three different data sources, of increasing generality. Group1 is a bunch of 2-value data points, Group2 is a good estimation of how the data in Group1 should be distributed (e.g. Group2 tells me there should be N datapoints with (x=7, y=2)), and Group3 is a collection of vague ranges into which I need to group the entries in Group2 and Group1 (e.g. Group3(1) = [5,8 ; 0,4]; Group3(2) = [7,9 ; 0,4]). I am trying to do two seperate things with these data sets, and whether it's from lack of sleep or coffee, I cannot figure out which MatLab functions I should be looking at to do the heavy lifting. I'm thinking one or more of hiscounts2, discretize, and/or maybe findgroups.

The tasks I'm trying to complete are:

1) Check that all the elements in Group1 align with the expected groups in Group2, and get some metadata on any outliers (e.g. to which Group2 element is any given unmatched Group1 element closest?)

2) ?Cluster? the elements in Group2 using the elements in Group3. E.g. if Group2(1) = [N,x=7,y=2], then it falls within both Group3(1) and Group3(2) as described above.

If any of y'all could help direct me to the appropriate functions Ishould focus on understanding & learning how to use, I would appreciate it.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Divyam 2024-9-18

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1777380-grouping-multi-variable-data-points#answer_1518565

在 MATLAB Online 中打开

Hi @Gabriel Stanley,

To determine whether data in Group1 aligns with the expected distribution in Group2 you can use the "pdist2" function to calculate the distance between the data points and use the "find" function for logically identifying the outliers after specifying a certain threshold.

% Example Group1 and Group2 data
Group1 = [7.1, 2.2; 6.9, 2.1; 7.5, 2.5]; 
Group2 = [7, 2; 8, 3]; 
% Calculate pairwise distances
distances = pdist2(Group1, Group2);
% Find the closest Group2 element for each Group1 point
[minDistances, closestIndices] = min(distances, [], 2);
% Determine outliers based on a threshold
outlierThreshold = 0.5;
outliers = find(minDistances > outlierThreshold);
% Display results
fprintf('Closest Group2 elements for each Group1 point: [%s]\n', join(string(closestIndices), ','));
Closest Group2 elements for each Group1 point: [1,1,1]
fprintf('Outliers: [%s]\n', join(string(outliers), ','));
Outliers: [3]

For determining the elements of Group2 which fall within the ranges specified in Group3, you can use both the "discretize" function or logical indexing.

% Example Group3 ranges
Group3 = {[5, 8; 0, 4], [7, 9; 0, 4]}; 
% Initialize clusters
clusters = cell(size(Group3));
% Check membership for each Group2 element using logical indexing
for i = 1:length(Group3)
    range = Group3{i};
    inCluster = Group2(:, 1) >= range(1, 1) & Group2(:, 1) <= range(1, 2) & ...
                Group2(:, 2) >= range(2, 1) & Group2(:, 2) <= range(2, 2);
    clusters{i} = find(inCluster);
end
% Display cluster assignments
for i = 1:length(clusters)
    fprintf('Group3(%d) contains Group2 elements: [%s]\n', i, join(string(clusters{i}), ','));
end
Group3(1) contains Group2 elements: [1,2]
Group3(2) contains Group2 elements: [1,2]

For more information regarding the "pdist2" and "find" functions, refer to the following documentation links:

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Grouping multi-variable data points

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Grouping multi-variable data points

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论