Comparing each vector in a cell array to another vector
8 次查看(过去 30 天)
显示 更早的评论
Hello. I am wondering if there is an easier way to check vectors in my cell array are subsets of another given vector. I want to keep track of how many times the vectors in C are found in T, for many different T's. For example:
T = [1 2 3 4 5 6]
C = { [1 2], [3 4], [5,6], [7 8], [9, 10] }
% loop through each candidate
for j = 1:numCands
c = Ck{j};
% Candidate Counts. Check if candidate is a subset of the
% current transaction
if ismember(c,t)
candCounts(j) = candCounts(j) + 1;
end
% Candidate Tail Counts. Check if the tail of the candidate is
% a subset of the current transaction
if ismember(c(2:end),t)
candTailCounts(j) = candTailCounts(j) + 1;
end
end
What I have done is loop through each member of C and check using ismember() for each element. But this gets really long when C becomes very large. I was wondering if there was any clever way of vectorization which would result in a faster runtime. I would also like to add that T can be varying sized vectors. Not sure if that information is relevant or not.
Thanks in advance!
Edit: For future references here is what I ended up doing, which is similar to what the answers suggested:
% Make all candidates into matrix for faster calculation
allCands = vertcat(Ck{:});
% Get number of columns to get where there are subsets.
[~,col] = size(allCands);
% Find locations where the candidates are subsets of the
% transactions for both the candidates and tail counts.
memb1 = ismember(allCands,t);
memb2 = ismember(allCands(:,2:end),t);
% Subset check
subCheck1 = find(sum(memb1,2) == col);
subCheck2 = find(sum(memb2,2) == col - 1);
% Increment counts
candCounts(subCheck1) = candCounts(subCheck1) + 1;
candTailCounts(subCheck2) = candTailCounts(subCheck2) + 1;
1 个评论
Walter Roberson
2017-2-5
编辑:Walter Roberson
2017-2-5
Does it happen to be the case that C and T only have integer elements in the range 0 to 65535 ?
Are all of the entries in C the same length? Are they always row vectors?
采纳的回答
Stephen23
2017-2-5
编辑:Stephen23
2017-2-5
Method one: one ismember call: the function ismember is going to to be slowest operation, so one simple speedup is to reduce how many times you call it: Note that a vector c will only be a member if the tail c(2:end) is a member, so you can test for the tail first, and only if the tail is a member do you need to check if the first element is also a member:
T = [1,2,3,4,5,6];
C = {[1,2],[3,4],[5,6],[7,8],[9,10]};
cntC = 0;
cntT = 0;
for k = 1:numel(C)
if ismember(C{k}(2:end),T)
cntT = cntT+1;
cntC = cntC+any(C{k}(1)==T);
end
end
Method two: Convert to a matrix: if all members of C are the same size vectors, then convert C to a matrix and perform one simple comparison using bsxfun:
M = vertcat(C{:});
U = reshape(T,1,1,[]);
X = any(bsxfun(@eq,M,U),3);
xT = all(X(:,2:end),2);
xC = xT&X(:,1);
cntT = sum(xT)
cntC = sum(xC)
5 个评论
Stephen23
2017-2-5
"The problem with that is if the elements occur out of order, you would still get a match."
That is what a subset is, and the original question specifically asks for "check vectors in my cell array are subsets of another given vector".
For a subset the order is irrelevant.
Stephen23
2017-2-5
编辑:Stephen23
2017-2-5
@Christopher Tran: I am happy to take a look. Please clarify one point: do you want to match the values in the vectors in the same order, or is the order irrelevant (as per set theory and the definition of subset). For example, is [2,1] a subset of [1,2,3] or not?
更多回答(1 个)
Walter Roberson
2017-2-5
If all members of C are the same length then you can use buffer() with overlap to create an array containing all the subsequences of that length, and transpose that and then convert C to a row matrix and then ismember() with rows option
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Operators and Elementary Operations 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!