Read multiple text files and extract part of data by name
1 次查看(过去 30 天)
显示 更早的评论
Hi,
I have used the below code to read and extract selected data from text file, I used textscan (to read) and
find(~cellfun(@isempty,regexpi(allText,'RainFallID')))
to identify the required data by name. It working well, but If I run 10,000 text files it become dam slow, takes more than three hours. Please kindly help some if there is any faster way.
Sinerely,
clc;
clear all;
clc
tic
FileList=dir('D:\Mekala_Backupdata\Matlab2010\Filesfolder\PartofTextFilesData/');
j=1;
for i=3:1:(size(FileList)) %%read all files from folder of specified dir
FileName{j}=FileList(i).name;
j=j+1;
end
for j=1:size(FileName,2)
fid=fopen(['D:\Mekala_Backupdata\Matlab2010\Filesfolder\PartofTextFilesData/',FileName{j}],'r'); %%opening each files and read each line
allText = textscan(fid,'%s','delimiter','\n');
numberOfLines = length(allText{1});
allText=allText{:};
for k=1:size(allText,1)
idx_RainFallID=find(~cellfun(@isempty,regexpi(allText,'RainFallID')));
idx_LMDName=find(~cellfun(@isempty,regexpi(allText,'LMD Name')));
idx_10Under1=find(~cellfun(@isempty,regexpi(allText,'10 Under Pipe 1.Response Value')));
idx_RainFallIDtemp=allText(idx_RainFallID);
idx_RainFallIDtemp2=regexp(idx_RainFallIDtemp,' +','split');
b(j,1)=str2double(idx_RainFallIDtemp2{1}{1,3});
Variable{1,1}=char(idx_RainFallIDtemp2{1}{1,1});
end
fclose(fid)
end
2 个评论
Walter Roberson
2016-1-23
You pull out idx_LMDName and idx_10Under1 but you do not do anything with them?
You always write over Variable{1,1} instead of storing for each file?
Is it correct that your desired output is the list of filenames, and a vector of the numeric forms of the corresponding RainfallID ?
采纳的回答
Walter Roberson
2016-1-23
The below should be faster. It makes use of some of the more advanced facilities of regexp. It is easy to get the pattern incorrect :(
project_dir = 'D:\Mekala_Backupdata\Matlab2010\Filesfolder\PartofTextFilesData';
FileList = dir(project_dir);
FileName = {FileList.name};
FileName([FileList.isdir]) = []; %get rid of . and .. and other directories
pattern = '(?<=RainFallID\s+:\s+)(?<RainFallID>\d+)|(?<=LMD\s+Name\s+:\s+)(?<LMD_Name>\S+)|(?<=10\s+Under\s+Pipe\s+1\.Response\s+Value\s*\S+\s+[a-zA-Z]+\s+)(?<TenUnder1>\S+)';
numfile = length(FileName);
RainFallID = zeros(numfile,1);
LMD_Name = cell(numfile,1);
Value = zeros(numfile,1);
for K = 1 : length(FileName)
thisfile = fullfile(project_dir, FileName{K});
filecontent = fileread(thisfile);
tokens = regexp(filecontent, pattern, 'names'); %1 x 3 struct with mostly empty entries
RainFallID(K) = str2double( vertcat(tokens.RainFallID) );
LMD_Name{K} = vertcat(tokens.LMD_Name);
Value(K) = str2double( vertcat(tokens.TenUnder1) );
end
I allowed for some variability in responses, but if the format differs too much you would encounter problems.
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!