problem in this code
1 次查看(过去 30 天)
显示 更早的评论
hi,
I have ran this code since more than 4 hours ,and did not complete yet. where is the problem ?
I read 1000 files, but the running time in unreasonable:
%%%%%%%%%%%%%%%%%%5
arr1=sparse(1000,232944);
targetdir = 'd:\social net\dataset\netflix\netflix_2\training_set';
%%nofusers=480189
targetfiles = '*.txt';
fileinfo = dir(fullfile(targetdir, targetfiles));
for i = 1:1000
thisfilename = fullfile(targetdir, fileinfo(i).name);
f = fopen(thisfilename,'r');
c = textscan(f, '%f %f %s', 'Delimiter', ',', 'headerLines', 1);
fclose(f);
c1=sparse(length(c));c2=sparse(length(c1));c3=sparse(length(c));
c1 = c{1};
c3=c{3};
L(i)=length(c1);
format long
dat=round(datenum(c3,'yyyy-mm-dd'));
arr=[c1 dat];
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
end
10 个评论
采纳的回答
Daniel Shub
2011-11-23
On every interation you create 3 sparse matrices:
c1=sparse(length(c));c2=sparse(length(c1));c3=sparse(length(c));
You then overwrite 2 of them and never use the third:
c1 = c{1};
c3=c{3};
The variable L is growing in the loop. This probably doesn't matter since it is not that long ...
L(i)=length(c1);
I believe the datenum function is slow (search for Jan Simon and datenum for answers with faster alternatives)
dat=round(datenum(c3,'yyyy-mm-dd'));
This bit of code looks crazy to me:
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
First, I have no idea how it doesn't crash since I think arr should have length L(i)+1, which only equals L(i)*2 if L(i) is equal to 1. You initialized arr1 to be a sparse matrix with a huge number of columns (but it seems like you only use 1). Also, it is unclear why you want arr1 to be sparse. A cell array might be better.
3 个评论
Daniel Shub
2011-11-23
Thank you I missed the reshape part. The reason I ask about the sparse matrix is that the non-zero elements are not distributed throughout the matrix. For each row i, only the first N_i elements will be possibly non-zero. By using a sparse matrix, the memory required for the matrix changes on each interation (and MATLAB needs to allocate and copy the entire sparse matrix). If you use a cell array, then MATLAB only has to allocate space for the new 2L elements and doesn't have to copy anything. In the end you will have the same number of nonzero elements and will essentially use the same amount of memory. The cell array will probably use less memory then the sparse matrix.
更多回答(1 个)
Daniel Shub
2011-11-23
I would try replacing
arr1=sparse(1000,232944);
with
arr1 = cell(1000, 1);
and
arr1(i,1:L(i)*2)=reshape(arr.',1,[]);
with
arr1{i} = reshape(arr.',1,[]);
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Parallel Computing Fundamentals 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!