Import only certain columns (with numeric values) from multiple text files with complex name
1 次查看(过去 30 天)
显示 更早的评论
Hi everyone!
I am wishing to import only certain columns (with numeric values) from several text files (approximately 118), into matlab space.
The name of text files has date estructure (year/month_rest of name) like this:
200705_nam12_Yucatan.bin_output.txt;
200706_nam12_Yucatan.bin_output.txt;
200707_nam12_Yucatan.bin_output.txt;
200708_nam12_Yucatan.bin_output.txt;
200709_nam12_Yucatan.bin_output.txt;
200710_nam12_Yucatan.bin_output.txt;
200711_nam12_Yucatan.bin_output.txt;
200712_nam12_Yucatan.bin_output.txt... until
201705_nam12_Yucatan.bin_output.txt.
As it could be inferred, text files are monthly files from 2007 to 2017, which contain meteorological measurements every three hours.
I have been working in this routine:
%ROUTINE TO EXTRACT NAM12 TXT TIME SERIES
%RANGE YEARS TO IMPORT
YEAR1=2007;
YEAR2=2017;
%RANGE MONTHS TO IMPORT
MONTH1=1;
MONTH2=12;
name=['nam12_Yucatan.bin_output'];
datafiles = dir('*.txt');
ext=['.txt'];
cols=[1:5,12:13];
DATA=[];
for i=1:length(datafiles)
for j=YEAR1:YEAR2
for k=MONTH1:MONTH2
Fname=[num2str(j),num2str(k),'_',name,ext];
A=dlmread(Fname);
DATA=[DATA ; (A(:,cols))];
end
end
end
clear A Fname MONTH1 MONTH2 YEAR1 YEAR2 cols datafiles ext i j k name
The problems that I have with this routine are the next:
1.- Name of files. In files which corresponds months from january to september (1-9), I can import text files with this estructure "20075_nam12_Yucatan.bin_output.txt", instead of this: "200705_nam12_Yucatan.bin_output.txt". Unfortunately, for those months, the software which generated the text files, add a "0" at the left of the month field. Due to the amount of files, I find unpractical to edit manually the names of text files, and I am wishing that the script can import such a files.
2.- Incomplete years. In some years, there are not all twelve months available (for example, in 2007 I have from may to december, in 2011 there is not october, etc.). When I execute the script, this "stops" when it doesn't find one monthly file.
3.- Amount of data imported. I am only wishing to import the columns of interest, however, with the loop that I have developed until this moment, it imports to the matlab space firstly the whole data (line: A=dlmread(Fname);) and secondly the columns of interest (line: DATA=[DATA ; (A(:,cols))];). I have find that this kind of condition increase the time of computation.
Wew! It seems to be a lot of doubts, but I really will appreciate your help.
I am attaching to this message some examples of the files that I am trying to import into matlab space.
Best,
Miguel
1 个评论
Stephen23
2017-5-30
"Unfortunately, for those months, the software which generated the text files, add a "0" at the left of the month field"
Actually you should be very happy that that software uses a leading zero, because this means that the filenames will sort into the chronological order using a simple character sort. If there was no leading zero it would be a very complex task keeping them in the correct order. You really are very lucky!
In any case, you could simply specify the format with num2str to ensure that your names also use leading zeros:
>> num2str(2,'%02d')
ans = 02
However you have much bigger problems than this. Your basic concept is very confused: you are get a list of filenames using dir but then you do not use those names except for defining how many loop iterations to make. Instead of using those names (and parsing them to extract the date info) you awkwardly generate new filenames which do not even exist.
Stop mixing your code up like this. Use dir, preallocate some output arrays (numeric, cell, struct, table, etc) as required, and add the data when you read the files. In this way it would be easy to ignore missing months, and you would solve all three points that you list in your question.
How to read multiple files is explained extensively in the documentation, on this forum, and in the wiki:
etc. You can also find some examples with my FEX submission natsortfiles:
I would reccomend that you do something like this, which I tested on your sample files:
S = dir('*.txt');
N = sort({S.name});
% Read file data:
C = cell(size(S));
for k = 1:numel(N)
C{k} = dlmread(N{k});
end
% get dates:
R = regexp(N,'^\d+','once','match');
... etc
采纳的回答
Guillaume
2017-5-29
1. "Unfortunately, for those months, the software which generated the text files, add a "0" at the left of the month field". That's not unfortunate, that's good practice. That way you are guaranteed that the month is always two characters and all strings are always the same length. It's trivially solved by using the appropriate format spec in num2str, or even better in sprintf:
Fname = sprintf('%04d%02d_%s%s', j, k, name, ext); % %0xd means use x digits and pad with 0 if necessary
2. You appear to have taken two approaches to solving your problem, mixed them up, and ended up with something very confusing that does not even work:
- Your first approach is to hardcode the start and end year and month and loop over that (your j and k loops). Indeed, if a year is missing your loop is going to fail. There's two possible easy way to work around that:
if exists(Fname, 'file')
A=dlmread(Fname);
DATA=[DATA ; (A(:,cols))];
end
try
A=dlmread(Fname);
DATA=[DATA ; (A(:,cols))];
catch
warning('Failed to open %s', Fname);
end
- Your second approach is to obtain the list of files and loop over these (your i loop, in which for some reason you've got the other two loops). If you'd continued with that you should have ended with something like:
for filenum = 1 : numel(datafiles) %use meaningful variable names rather than i
filecontent = dlmread(datafiles(filenum).name);
DATA = [DATA; A(:, cols))];
end
3. You don't have a choice, text files are read line by line. It's not possible to read only some columns. Your approach is fine.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!