detect correct startRow in fopen before textscan

2 次查看(过去 30 天)
Hello,
I have a text file containing 17 columns of data, with a variable string header above the data. The header contains several rows of strings. The number of rows is not fixed, otherwise I will not request to post my question on the forum.
The column of data that I want to import are located at a certain row defined by startRow, but the value of startRow depend on the headers number of rows. How many rows are defining the headers is unknow after using fopen, but must be known when using textscan. So in between, I have to implement an automated detection of startRow, whatever the header above the data.
This is an example of the text file.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
program_20181015.vi
line2
line3
line4
t T V off F I1 V1 I2 V2 Li1 Li2 X1 Y2 X3 Y4 V5 c
6.357780E+2 2.999041E+2 3.500000E-3 0.000000E+0 1.100000E+0 5.000000E-8 1.999990E+101 -5.000000E-12 1.000000E-4 7.140000E-6 -9.620000E-6 2.395640E-1 -4.995750E-2 2.400520E-1 -5.032370E-2 -2.727684E-7 0.000000E+0 0.000000E+0 0.000000E+0 0.000000E+0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In this particular example, Line 6 corresponds to the startRow that I want to detect.
the string chains 't T V off F I1 V1 I2 V2 Li1 Li2 X1 Y2 X3 Y4 V5 c' is always the same whatever the content above this line. So this could be nice to detect such string using find function, because data starta hereafter this line.
Of course I can simply set startRow = 6, and it is solved. But depending on user, I have different number of headers rows above the data. So I need to detect startRow automatically.
In forum, I found the interesting try / catch. Maybe it is nice to use it for my purpose. If startRow =1 (because it should be 6), then an error occurs of course. So catch will not be executed.
Here, I would like to implement startRow = startRow +1, and try again. If no error then catch. Or startRow = startRow +1 and try again.
How to do that ?
startRow = 1;
try
delimiter = '\t';
formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';
fileID = fopen(fichier,'r');
catch me
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'TextType', 'string', 'EmptyValue', NaN, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
end
RAW = importdata(filename,'\t',startRow);
M= RAW.data(:,1:17);

采纳的回答

Etsuo Maeda
Etsuo Maeda 2018-11-28
编辑:Etsuo Maeda 2018-11-28
A while loop will help you.
k = 0;
while exist('D') ~= 1
try
D = dlmread('yourfile.txt', '', k, 0);
catch
k = k + 1;
end
end
HTH

更多回答(6 个)

laurent jalabert
laurent jalabert 2018-11-28
Dear Maeda-san,
I tried to adapt your code like this, but I stopped running it by CTRL-C : startRow was about 73458. Actually, the correct value should be around 22.
startRow = 0 ;
while exist('dataArray') ~= 1
try
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'TextType', 'string', 'EmptyValue', NaN, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
catch ME
startRow = startRow+1;
end
end

Etsuo Maeda
Etsuo Maeda 2018-11-29
编辑:Etsuo Maeda 2018-11-29
Hi laurent jalabert - san,
textscan is little bit different from dlmread.
In case of textscan, its "empty" output exists in every loop and everywhere.
In case of dlmread, its output can exist when it succeed to read numerical data, not text data.
So, "exist('dataArray') ~= 1" cannot work well with your original code.
You can confirm the difference between textscan and dlmread toward unexpected data input with following codes.
fid = fopen('yourfile.txt');
fspec = repmat('%f', [1, 16]);
D = textscan(fid, fspec, 'HeaderLines', 0);
fclose(fid);
and
D = dlmread('yourfile.txt', '', 0, 0);
and 'yourfile.txt'
program_20181015.vi
line2
line3
line4
t T V off F I1 V1 I2 V2 Li1 Li2 X1 Y2 X3 Y4 V5 c
6.357780E+2 2.999041E+2 3.500000E-3 0.000000E+0 1.100000E+0 5.000000E-8 1.999990E+101 -5.000000E-12 1.000000E-4 7.140000E-6 -9.620000E-6 2.395640E-1 -4.995750E-2 2.400520E-1 -5.032370E-2 -2.727684E-7 0.000000E+0 0.000000E+0 0.000000E+0 0.000000E+0
I believe my suggested code in the previous post with dlmread can work well for your data without any modification.
If you need to use textscan function, I can suggest an another way using "isempty" function.
fid = fopen('yourfile.txt');
fspec = repmat('%f', [1, 16]);
k = 0;
D{1, 1} = [];
while isempty(D{1, 1}) == 1
D = textscan(fid, fspec, 'HeaderLines', k);
k = k +1;
end
fclose(fid);
HTH

laurent jalabert
laurent jalabert 2018-11-30
Dear Maeda-san
thank you very much for your help. I tried to implement your code, but it did not work. If it works, then in my example, k=6;
Using your code, k=1. Therefore I get an error.
Basically, I want to detect startRow value, in order to import the data from startRow, and import the header until startRow-1.
I understand your code like this.
At first, D{1,1} =[ ] therefore for k=0, isempty(D{1, 1}) == 1. Then D=textscan(...) and k=k+1. Then I get an error because textscan(...) has not the correct value of headLines.
So I am sorry for my possible misunderstanding, but this code might not work.
The data are always located after this line :
t T V off F I1 V1 I2 V2 Li1 Li2 X1 Y2 X3 Y4 V5 c
So how can I simply detect the row value of this above line in my text file ?
Yours
Laurent

Etsuo Maeda
Etsuo Maeda 2018-11-30
编辑:Etsuo Maeda 2018-11-30
Hello Laurent - san,
I analyzed your code and finally I found out a bug in "textscan" function with "while" loop!!!
Reproduction steps are following.
aaaaa
bbbbb
ccccc
dddd
eeee
1 2 3
4 5 6
and
clear; close all; fclose all;
fid = fopen('test.txt', 'r');
fspec = '%f%f%f';
k = 0;
while exist('D') ~= 1
try
D = textscan(fid, fspec, 'HeaderLines', k, 'ReturnOnError', false); % k = 3 NO error EMPTY D
catch
k = k +1;
disp(k)
end
end
D = textscan(fid, fspec, 'HeaderLines', k, 'ReturnOnError', false); % k = 3 NO error EMPTY D
fclose(fid);
fid = fopen('test.txt', 'r');
D = textscan(fid, fspec, 'HeaderLines', k, 'ReturnOnError', false); % k = 3 error!!!
fclose(fid)
"k" should be 5 but the while loop stops at 3.
"D" exists but it is empty.
When try to peform textscan again before fclose, it also works but D is empty.
After fclose and 2nd fopen, textscan will show an error with k = 3 and D is not created.
In case of your file, textscan will return strange numbers and stops at k = 4.
It is unexpected behavior of textscan.
I will report your case to the development team in US.
As a workaround, could you please use my 1st code to find the ROW number?
k = 0;
while exist('D') ~= 1
try
D = dlmread('yourfile.txt', '', k, 0);
catch
k = k + 1;
end
end
The numerical data start from 6th line.
"D" will contain numerical data.
"k" will be 5.
Thank you very much for your question and patience.
HTH

laurent jalabert
laurent jalabert 2018-11-30
Dear Maeda-san
the code below works well and solved my question. I deeply thank you for your answer, and your time to help me. It could be useful to many people who wants to read any data file containing variable headers lines.
k = 0;
while exist('D') ~= 1
try
D = dlmread('yourfile.txt', '', k, 0);
catch
k = k + 1;
end
end
Now, to retrieve the headers, usually I use this kind of code,
RAW = importdata(filename,'\t',startRow);
M= RAW.data(:,1:21);
header = (RAW.textdata(1:startRow-1))';
In 1 data file, I found k=21, size(D) = [69735 21] , that means there are 21 columns of data, and 69735 lines. The headers are located from row =1 to row = 20. How can I get the headers ?
With importdata, it is quite easy as I show above with RAW.textdata function.
With dlmread, how is it ? I guess I should use textscan from row =1 to row =21 , isn't it ?
Yours
Laurent

Etsuo Maeda
Etsuo Maeda 2018-11-30
Dear Laurent - san,
"dlmread" is a function to read numbers, not for characters.
So, it is impossible to read your header characters using "dlmread".
As you mentioned, "textscan or importdata again with determined k" is one of workaournds.
I think "readtable" is a good tool for you if you know number of the variables.
(The first question was try-catch problem. So I used try-catch statement in my answers before.)
clear; close all;
% R2016b and later
filename = 'yourfile.txt';
numOfVariables = 21;
opts = detectImportOptions(filename, 'Delimiter', '\t', 'NumVariables', numOfVariables);
T = readtable(filename, opts)
T.Properties.VariableNames
If you do not know the number of the variables, the following code may work but I cannot make any promise.
clear; close all;
% R2016b and later
filename = 'yourfile.txt';
opts = detectImportOptions(filename, 'Delimiter', '\t'); % remove NumVariables
T = readtable(filename, opts)
T.Properties.VariableNames
HTH

类别

Help CenterFile Exchange 中查找有关 Text Data Preparation 的更多信息

标签

产品


版本

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by