Import files based on file name

I have a folder containing around 300,000 files. I don't need to import all the files.
Problem: How can I import the files based on specific file name?
Problem example:
In the picture below I have a section of the files, which are all in the same folder. I only want to import the .data files. But I don't need all the .data files to be import only the last one of every serie.
Up until now I have the following code:
files = dir(fullimpfile(pwd,'*.data*'));
expData = cell(length(files),1);
for i = 1:length(files)
fid = fopen(fullfile(files(i).folder,files(i).name),'r');
%% Reading the data
% Read all the data from the file
dataRead = textscan(fid,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f','HeaderLines',1);
end
This code imports all of the data files. How can I use I think a for loop to only import only the last .data files of every serie?

 采纳的回答

fnm = {...
'LedgeTest_muSP_0.10_muRP_0.10.1.data.0',...
'LedgeTest_muSP_0.10_muRP_0.10.1.data.1',...
'LedgeTest_muSP_0.10_muRP_0.10.1.data.2',...
'LedgeTest_muSP_0.10_muRP_0.10.1.data.21',...
'LedgeTest_muSP_0.10_muRP_0.20.1.data.0',...
'LedgeTest_muSP_0.10_muRP_0.20.1.data.1',...
'LedgeTest_muSP_0.10_muRP_0.20.1.data.2',...
'LedgeTest_muSP_0.10_muRP_0.20.1.data.11'}
spl = regexp(fnm,'\.data\.','split','once');
spl = vertcat(spl{:});
vec = str2double(spl(:,2));
[~,idx] = sort(vec);
[~,idy,idz] = unique(spl(idx,1),'last');
out = fnm(idx(idy))
Giving:
out =
'LedgeTest_muSP_0.10_muRP_0.10.1.data.21'
'LedgeTest_muSP_0.10_muRP_0.20.1.data.11'
Use it like this:
D = pwd;
tmp = dir(fullfile(D,'*.data.*'));
fnm = {tmp.name};
...
for k = 1:numel(out)
fid = fopen(fullfile(D,out{k}),'r');
...
end

9 个评论

I implemented your code into my code:
files = dir(fullfile(pwd,'*.data*'));
spl = regexp({files.name},'\.data\.','split','once');
spl = vertcat(spl{:});
vec = str2double(spl(:,2));
[~,idx] = sort(vec);
[~,idy,idz] = unique(spl(idx,1),'last');
out = struct2cell(files(idx(idy)))
for k = 1:numel(out)
fid = fopen(fullfile(pwd,out{k}),'r');
...
end
But I got the following error:
Error using fullfile (line 103)
All inputs must be strings, character vectors, or cell arrays of character vectors.
Error in untitled2 (line 12)
fid = fopen(fullfile(pwd,out{k}),'r');
Although innovative, this line:
out = struct2cell(files(idx(idy)))
and the corresponding indexing:
out{k}
need some more thought.
Note that struct2cell creates a cell array where the first dimension encodes the fields and the other dimensions correspond to the dimensions of the input structure arrays shifted/permuted by one.
So with your adaption, the code will iterate (using linear indexing) down the first column, which thus contains the data from the first element of the structure array returned by dir. Not all of this data is character, e.g. the fields bytes, isdir, and datenum. So as soon as your code refers to one of the cells with those data, it will throw an error. Even without throwing an error the code would still be incorrect because only one of the cells in each column is actually the filename.
One fix would be to change the code to use subscript indexing instead of linear indexing. Or use a simple comma-separated list to create a cell array with only the filenames:
out = {files(idx(idy)).name};
Or if you want a sub-structure of that returned by dir then just index into it:
sub = tmp(idx(idy));
The approach I showed you in my answer works without error because it does not include all of the other fields in the cell array, only the filenames, thus trivial linear indexing is all that is required.
Thank you so much for all the help!
Is it also possible to extract a range of files. For example the last 5 .data files of every serie?
And I also encounted an error when adjusting the previous code:
Error using textscan
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in LedgeTest2D_results (line 30)
dataRead = textscan(fid,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f','HeaderLines',1);
% Select only .data file from the last time step of each simulation
files = dir(fullfile(uigetdir,'*.data*'));
spl = regexp({files.name},'\.data\.','split','once');
spl = vertcat(spl{:});
vec = str2double(spl(:,2));
[~,idx] = sort(vec);
[~,idy,idz] = unique(spl(idx,1),'last');
out = {files(idx(idy)).name};
expData = cell(length(out),1);
for i = 1:length(out)
fid = fopen(fullfile(files.folder,out{i}),'r');
%% Reading the data
% Read all the data from the file
dataRead = textscan(fid,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f','HeaderLines',1)
end
The matlab file is in a different folder from the .data files.
"The matlab file is in a different folder from the .data files."
Then you need to tell textscan where to look. Your innovative approach of using
fid = fopen(fullfile(files.folder,out{i}),'r');
will not work for two main reasons:
  • files.folder creates a comma-separated list containing the folder data from all elements of files, which you the supply as inputs to fullfile. Basically your code does this: fullfile(files(1).folder, files(2).folder, .... , files(end).folder, out{i}), which is very unlikely to be the path of an actual folder.
  • There is no attempt to use any indexing to provide only the relevant files data. Not all filenames from files occur in out (that was the whole point of your question), but you make not attempt to get only the elements of files that correspond to the filenames in out.
Probably the easiest approach would be to just use the code which I gave at the end of my original answer, but replace pwd with uigetdir:
D = uigetdir(...);
"That is not ideal."
Personally I would avoid uigetdir, I only referred to it because that was what you showed in your previous comment. Putting UIs into code makes them difficult to generalize, to call in loops or from other functions, or to automatically test.
I would just supply that folder name once as a parameter/function input.
Thank you for your explenation!
Is it also possible to do the same approach, but then the files are in different folders. I tried that with the followin piece of code:
%% Loading the data
rhoPart = 2540;
% Select the main folder
Folder = uigetdir;
% Find all .data files in the sub folders
files = dir(fullfile(Folder,'\**\*.data*'));
% Select only .data file from the last time step of each simulation run
spl = regexp({files.name},'\.data\.','split','once');
spl = vertcat(spl{:});
vec = str2double(spl(:,2));
[~,idx] = sort(vec);
[~,idy,idz] = unique(spl(idx,1),'last');
out = {files(idx(idy)).name};
k = 1;
for i = 1:length(out)
fid = fopen(fullfile(files(i).folder,out{i}),'r');
%% Reading the data
% Read all the data from the file
dataRead = textscan(fid,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f','HeaderLines',1);
% frewind(fid);
% Write headerline N, time, xmin, ymin, zmin, xmax, ymax, zmax
% runData{k} = strsplit(fgetl(fid), ' ');
% Write only the x, y, and z components of the particles, particle radius,
% z component+ particle radius and volume of the particle
expData{k} = [dataRead{1}(:,1) dataRead{2}(:,1) dataRead{3}(:,1) dataRead{7}(:,1) dataRead{3}(:,1)+dataRead{7}(:,1) rhoPart*(4/3)*pi*(dataRead{7}(:,1).^3)];
% Write only the vx,vy,vz of the particles and magnitude
velData{k} = [dataRead{4}(:,1) dataRead{5}(:,1) dataRead{6}(:,1) sqrt(dataRead{4}(:,1).^2 + dataRead{5}(:,1).^2 + dataRead{6}(:,1).^2)];
fclose(fid);
k = k + 1;
end
But this obviously doesn't work the files(i).folder doesn't match the out{i}.
The main folder where all the .data files are stored is called 2Dtest4.
Then, I have 4 subfolders called:
2Dtest_all-0.45
2Dtest_all-0.67
2Dtest_all-0.89
2Dtest_all-0.123
I those subfolders are the .data files stored.
files(idx(idy)).folder
Did not work either. I got an error saying:
Error using textscan
Invalid file identifier. Use fopen to generate a valid file identifier.
Error in LedgeTest2D_results (line 33)
dataRead = textscan(fid,'%f %f %f %f %f %f %f %f %f %f %f %f %f %f','HeaderLines',1);
It would probably be easier to get rid of out altogether and just sort the structure itself, e.g.:
files = files(idx(idy));
and then inside the loop you can simply access the folder and name, e.g.:
for k = 1:nume(files)
fnm = fullfile(files(k).folder,files(k).folder);
fid = fopen(fnm,'rt');
...
end

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 Call C from MATLAB 的更多信息

产品

版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by