convert files into matrix
显示 更早的评论
hi,
I have 177000 files, I have to create matrix contain all values in these files.
Each file was split using textscan to get
c{1},c{2},........
then convert it into matrix.
Then convert these matrices into one matrix.
the problem is these files contain some similar values, so I have to specify the similar values ,and drew all other attached values(row) with these values.
I tried running with 100 files to know running time , I found out the running time is very long for just 100 files.
I think if I find function can compare among c{1}for all files, and among c{2} for all files ,...etc . I think that will save time. I'm facing problem with this code:
targetdir = 'd:\social net\dataset\netflix\training_set';
targetfiles = '*.txt';
fileinfo = dir(fullfile(targetdir, targetfiles));
k=0;arr(:,:)=0; inc=0;k=0;y=1;
for i = 1: length(fileinfo)
thisfilename = fullfile(targetdir, fileinfo(i).name);
f=fopen(thisfilename,'r'); f1=fscanf(f,'%c'); f1(1:2)=[];
f2=fopen(thisfilename,'w'); fprintf(f2,'%c',f1);
f3=fopen(thisfilename,'r');
c = textscan(f,'%f %f %s','Delimiter',',','headerLines',1);
c1=c{1};c2=c{2}; c3=c{3};z=1;z1=1;z2=1;z3=0;
for k=1+k:length(c1)+inc
no=c1(z); arr1=arr(:,1); p=find(arr1==no);
if isempty(p)
j=1;
arr(y,j)=c1(z); arr(y,j+1)=i; arr(y,j+2)=c2(z);j=j+3;y=y+1;
else
ind(i,z1)=p;
L=arr(p,:);len=0;
for h=1:length(L)
if L(h)~=0
len=len+1;
end
end
len;
arr(p,len+1)=i;
arr(p,len+2)=c2(z);
z1=z1+1;
end
z=z+1;
end
inc=inc+length(c1);
[u,u1] =size(arr);
end
f4=fopen('netfile.txt','w');
for i=1:u
for j=1:u1
fprintf(f4,'%d ',arr(i,j));
end
fprintf(f4,'\n');
end
fclose all;
thanks
采纳的回答
更多回答(1 个)
Jan
2011-11-9
Some general advices for improving the speed:
- One command per line only - otherwise the JIT acceleration looses its power.
- Avoid dump commands as "len;" - it wastes time.
- Deleting the 1st two bytes from the file needs a lot of time. Better open the file, read two bytes and call TEXTSCAN afterwards.
- Close every file as soon as possible properly by fclose(fid). Do not leave all files open until the final fclose('all'). Open files consume resources.
- Use the vectorizing of fprintf. Instead of for j=1:u1, fprintf(f4,'%d ',arr(i,j)); end prefer fprintf(f4, '%d ', arr(i, :)).
- Counting the number of non-zero elements in L does not need a loop. Faster: len = sum(L ~= 0);.
- arr(:, :) = 0 is not useful, because it is equal to a = 0. k is defined twice.
I cannot insert a pre-allocation, because I do not know the maximal possible size of "arr". But this should be faster already:
function wwq
targetdir = 'd:\social net\dataset\netflix\training_set';
targetfiles = '*.txt';
fileinfo = dir(fullfile(targetdir, targetfiles));
arr = 0; % Better pre-allocate
inc = 0;
kk = 0;
y = 1;
for i = 1:length(fileinfo)
thisfilename = fullfile(targetdir, fileinfo(i).name);
f = fopen(thisfilename,'r');
fread(f, 2, 'uint8'); % Skip two bytes
c = textscan(f, '%f %f %s', 'Delimiter', ',', 'headerLines', 1);
fclose(f);
c1 = c{1};
c2 = c{2};
% c3=c{3}; % Not used
z = 1;
% z1 = 1; % Not used
% z2 = 1; % Not used
% z3 = 0; % Not used
kknew = length(c1) + inc;
for k = (1 + kk):kknew % Avoid k as counter *and* in loop index
no = c1(z);
p = find(arr(:, 1) == no);
if isempty(p)
arr(y, 1) = c1(z);
arr(y, 2) = i;
arr(y, 3) = c2(z);
% j = j+3; % Not used
y = y + 1;
else
% ind(i,z1) = p; % Not used
L = arr(p, :);
len = sum(L ~= 0);
arr(p, len + 1) = i;
arr(p, len + 2) = c2(z);
% z1 = z1 + 1; % Not used
end
z = z + 1;
end
kk = kknew;
inc = inc + length(c1);
u = size(arr, 1);
end
f = fopen('netfile.txt','w');
for i = 1:u
fprintf(f, '%d ', arr(i, :));
fprintf(f,'\n');
end
fclose(f);
类别
在 帮助中心 和 File Exchange 中查找有关 Audio and Video Data 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!