Reduce the Size of Matrix
5 次查看(过去 30 天)
显示 更早的评论
I need to Reduce my Matrices(Xbool_last and Xfreq_last) , because in the 1000th step of loop(it means docx=1000) , Matlab said : out of Memory!(loop is from 1 to 5549 !! )
please look at the part of code with information in it:
%%The loop for exploring ALL the documents to create the tf-idf weight matrix
for docx = 1 : length(DBlast)
docx
for word = 1 : length(DBlast{docx})
% In docx , we search all words in docx
word_xi = DBlast{docx}{word,1} ;
for docy = 1 : length(DBlast)
% While the source words are from docx search for them in
% the rest of documents
% if word_1i found in document i(=doc) vote 1
if sum(strcmpi(DBlast{docy},word_xi)) ~= 0
ind = find(strcmpi(DBlast{docy},word_xi) ~= 0) ;
Xbool(word,docy) = 1 ;
Xfreq(word,docy) = Freqlast{docy}(ind) ;
else
% else vote 0
Xbool(word,docy) = 0 ;
Xfreq(word,docy) = 0 ;
end
end
end
Xbool_last = [Xbool_last;uint8(Xbool)];
Xfreq_last = [Xfreq_last;uint8(Xfreq)];
Xbool = [] ;
Xfreq = [] ;
end
===============================================================================
So, questions: 1- how can i Reduce the size of Xbool_last and Xfreq_last? if i need to export Matrices TO .txt file (or something else) for Using it , How can I save them? or load them?
can you say the recommended code?
2. How can I use, the output of above code in tf-idf algorithm?(if you konw),
the tf-idf code is attached
0 个评论
采纳的回答
Guillaume
2014-11-22
编辑:Guillaume
2014-11-22
You're already using uint8 to store your values. There isn't a smaller type unless you start packing booleans into bits which I assume is not possible for Xfreq_last anwyay. Using bits to store boolean is bound to be slow in matlab and awkward in matlab. There's no built-in function for that.
However, your storage looks incredibly inefficient to me. Say you're processing the first word of the first document. You find it in documents 2, 10, 150, 2048, 4125 for example. For a start, instead of storing those values (which woudln't take much memory, ~20 bytes as uint32), you store a boolean array of size 1x5549 (~5549 bytes) with only a few ones. But more importantly, in document 2, you're going to be looking for the exact same word, which you'll find in the exact same documents and store that again. Why?
Why not do the storage per word, instead of document, and for each word, just store which document it's found in?
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 NaNs 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!