How do I create a term - frequency matrix that runs fast
3 次查看(过去 30 天)
显示 更早的评论
Hello everyone! I am trying to create a term frequency matrix for a TF-IDF program. I have the code written but it runs extremely slow. My code works by finding the unique words in all of the documents, say for example
A = {'dog','cat','mouse'}
and for example two documents,
D = {'dog','cat','cat'; 'cat','mouse','mouse'}.
The code I have is:
for k = 1:n
for j = 1:m
seq_sum(k,j) = sum(ismember(D{k,:},A{j}));
end
end
The output of the above example would be a matrix that looks like
seq_sum = [1 2 0 ; 0 1 2];
where k is the row size of the cell array D and j is the column size of A. I also have this written in parallel but I don't want to have to rely on parallel computing. Any help would be greatly appreciated! Oh and I guess my question is how can I improve this to run faster?
2 个评论
Muthu Annamalai
2013-9-5
In MATLAB 13b, the new datatype 'categorical' is designed to solve this problem.
采纳的回答
Azzi Abdelmalek
2013-9-5
编辑:Azzi Abdelmalek
2013-9-5
A = {'dog','cat','mouse'};
D = {'dog','cat','cat'; 'cat','mouse','mouse'};
out=zeros(size(D));
for k=1:numel(A)
idx=ismember(D,A(k));
out(:,k)=sum(idx,2);
end
disp(out)
%or
out=cell2mat(arrayfun(@(x) sum(ismember(D,A(x)),2),1:numel(A),'un',0))
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Logical 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!