Subsets of uncorrelated features
3 次查看(过去 30 天)
显示 更早的评论
Given a N by N correlation matrix of N features, how to find ALL subsets of pariwise uncorrelated features if we assume two features are uncorrelated if their correlation score is less than a threshold Alpha. There is no restriction on the number of features making the subsets. All features making a subset need to be pairwise uncorrelated.
0 个评论
采纳的回答
Jeff Miller
2021-7-12
编辑:Jeff Miller
2021-7-12
N = 5;
R = rand(N); % We will ignore the lower triangular part of this array
rCutoff = 0.4;
% Make a cell array that holds all possible combinations of 2, 3, 4, ... features
combos = cell(0,0);
for i=2:N
iCombos = nchoosek(1:N,i);
for j=1:size(iCombos,1)
combos{end+1} = iCombos(j,:);
end
end
ncells = numel(combos);
% Check each cell to make sure that all of the pairwise correlations are
% less than the cutoff
qualifies = true(1,ncells);
for icell=1:ncells
features = combos{icell};
nfeatures = numel(features);
for ifeature=1:nfeatures-1
for jfeature=ifeature+1:nfeatures
iifeature = features(ifeature);
jjfeature = features(jfeature);
if abs(R(iifeature,jjfeature)) > rCutoff
qualifies(icell) = false;
end
end
end
end
5 个评论
Jeff Miller
2021-7-13
You may well be right, that but "if sum" line is cognitively impenetrable to me. :)
Thanks for accepting my answer.
更多回答(2 个)
Ive J
2021-7-11
编辑:Ive J
2021-7-12
Let R be the pairwise correlation matrix:
N = 10;
R = rand(N);
R(logical(eye(N))) = 1;
for i = 1:size(R, 1) - 1
for j = i+1:size(R, 1)
R(j, i) = R(i, j);
end
end
disp(R)
cutoff = 0.4; % independent features
idx = R < cutoff;
idx = triu(idx); % R(i, j) == R(j, i) in pairwise correlation matrix
features = "feature" + (1:N); % feature names
% there may be a simpler way to do this
indepFeatures = [];
for i = 1:N
indepFeatures = [indepFeatures, arrayfun(@(x)[x, features(i)], features(idx(i, :)), 'uni', false)];
end
indepFeatures = vertcat(indepFeatures{:});
% find all cliques of this set
nodes = zeros(size(indepFeatures, 1), 1);
[~, nodes(:, 1)] = ismember(indepFeatures(:, 1), features);
[~, nodes(:, 2)] = ismember(indepFeatures(:, 2), features);
G = graph(nodes(:, 1), nodes(:, 2));
M = maximalCliques(adjacency(G));
indepSets = cell(size(M, 2), 1);
for i = 1:numel(indepSets)
indepSets{i} = features(M(:, i) ~= 0);
end
indepSets(cellfun(@numel, indepSets) < 2) = []; % this can be further unified with indepFeatures
Image Analyst
2021-7-11
Would stepwise regression be of any help?
Otherwise, just make an N by N table of correlation coefficients by corelating every feature with every other feature.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Descriptive Statistics 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!