detect correct startRow in fopen before textscan

Question

laurent jalabert 2018-11-26

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan

回答： Etsuo Maeda 2018-11-30

Hello,

I have a text file containing 17 columns of data, with a variable string header above the data. The header contains several rows of strings. The number of rows is not fixed, otherwise I will not request to post my question on the forum.

The column of data that I want to import are located at a certain row defined by startRow, but the value of startRow depend on the headers number of rows. How many rows are defining the headers is unknow after using fopen, but must be known when using textscan. So in between, I have to implement an automated detection of startRow, whatever the header above the data.

This is an example of the text file.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

program_20181015.vi																			
line2
line3
line4															
t	T	V	off	F	I1	V1	I2	V2	Li1	Li2	X1	Y2	X3	Y4	V5	c			
6.357780E+2	2.999041E+2	3.500000E-3	0.000000E+0	1.100000E+0	5.000000E-8	1.999990E+101	-5.000000E-12	1.000000E-4	7.140000E-6	-9.620000E-6	2.395640E-1	-4.995750E-2	2.400520E-1	-5.032370E-2	-2.727684E-7	0.000000E+0	0.000000E+0	0.000000E+0	0.000000E+0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

In this particular example, Line 6 corresponds to the startRow that I want to detect.

the string chains 't T V off F I1 V1 I2 V2 Li1 Li2 X1 Y2 X3 Y4 V5 c' is always the same whatever the content above this line. So this could be nice to detect such string using find function, because data starta hereafter this line.

Of course I can simply set startRow = 6, and it is solved. But depending on user, I have different number of headers rows above the data. So I need to detect startRow automatically.

In forum, I found the interesting try / catch. Maybe it is nice to use it for my purpose. If startRow =1 (because it should be 6), then an error occurs of course. So catch will not be executed.

Here, I would like to implement startRow = startRow +1, and try again. If no error then catch. Or startRow = startRow +1 and try again.

How to do that ?

startRow = 1; 
try 
delimiter = '\t'; 
formatSpec = '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%[^\n\r]';
fileID = fopen(fichier,'r');
catch me
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'TextType', 'string', 'EmptyValue', NaN, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
end 
 
RAW = importdata(filename,'\t',startRow);
M= RAW.data(:,1:17);

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Etsuo Maeda 2018-11-28

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349332

编辑：Etsuo Maeda 2018-11-28

在 MATLAB Online 中打开

A while loop will help you.

k = 0;
while exist('D') ~= 1
    try
        D = dlmread('yourfile.txt', '', k, 0);
    catch
        k = k + 1;
    end
end

HTH

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

laurent jalabert 2018-11-28

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349344

在 MATLAB Online 中打开

Dear Maeda-san,

I tried to adapt your code like this, but I stopped running it by CTRL-C : startRow was about 73458. Actually, the correct value should be around 22.

startRow = 0 ;
while exist('dataArray') ~= 1
try
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'TextType', 'string', 'EmptyValue', NaN, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
catch ME
startRow = startRow+1;
end
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 3

Etsuo Maeda 2018-11-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349494

编辑：Etsuo Maeda 2018-11-29

在 MATLAB Online 中打开

Hi laurent jalabert - san,

textscan is little bit different from dlmread.

In case of textscan, its "empty" output exists in every loop and everywhere.

In case of dlmread, its output can exist when it succeed to read numerical data, not text data.

So, "exist('dataArray') ~= 1" cannot work well with your original code.

You can confirm the difference between textscan and dlmread toward unexpected data input with following codes.

fid = fopen('yourfile.txt');
fspec = repmat('%f', [1, 16]);
D = textscan(fid, fspec, 'HeaderLines', 0);
fclose(fid);

and

D = dlmread('yourfile.txt', '', 0, 0);

and 'yourfile.txt'

program_20181015.vi	
line2
line3
line4	
t	T	V	off	F	I1	V1	I2	V2	Li1	Li2	X1	Y2	X3	Y4	V5	c	
6.357780E+2	2.999041E+2	3.500000E-3	0.000000E+0	1.100000E+0	5.000000E-8	1.999990E+101	-5.000000E-12	1.000000E-4	7.140000E-6	-9.620000E-6	2.395640E-1	-4.995750E-2	2.400520E-1	-5.032370E-2	-2.727684E-7	0.000000E+0	0.000000E+0	0.000000E+0 0.000000E+0

I believe my suggested code in the previous post with dlmread can work well for your data without any modification.

If you need to use textscan function, I can suggest an another way using "isempty" function.

fid = fopen('yourfile.txt');
fspec = repmat('%f', [1, 16]);
k = 0;
D{1, 1} = [];
while isempty(D{1, 1}) == 1
    D = textscan(fid, fspec, 'HeaderLines', k);
    k = k +1;
end
fclose(fid);

HTH

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 4

laurent jalabert 2018-11-30

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349737

Dear Maeda-san

thank you very much for your help. I tried to implement your code, but it did not work. If it works, then in my example, k=6;

Using your code, k=1. Therefore I get an error.

Basically, I want to detect startRow value, in order to import the data from startRow, and import the header until startRow-1.

I understand your code like this.

At first, D{1,1} =[ ] therefore for k=0, isempty(D{1, 1}) == 1. Then D=textscan(...) and k=k+1. Then I get an error because textscan(...) has not the correct value of headLines.

So I am sorry for my possible misunderstanding, but this code might not work.

The data are always located after this line :

t T V off F I1 V1 I2 V2 Li1 Li2 X1 Y2 X3 Y4 V5 c

So how can I simply detect the row value of this above line in my text file ?

Yours

Laurent

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 5

Etsuo Maeda 2018-11-30

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349749

编辑：Etsuo Maeda 2018-11-30

yourfile.txt

Hello Laurent - san,

I analyzed your code and finally I found out a bug in "textscan" function with "while" loop!!!

Reproduction steps are following.

aaaaa
bbbbb
ccccc
dddd
eeee
1	2	3
4	5	6

and

clear; close all; fclose all;
fid = fopen('test.txt', 'r');
fspec = '%f%f%f';
k = 0;
while exist('D') ~= 1
    try
        D = textscan(fid, fspec, 'HeaderLines', k, 'ReturnOnError', false); % k = 3 NO error EMPTY D
    catch
        k = k +1;
        disp(k)
    end
end
D = textscan(fid, fspec, 'HeaderLines', k, 'ReturnOnError', false); % k = 3 NO error EMPTY D
fclose(fid);
fid = fopen('test.txt', 'r');
D = textscan(fid, fspec, 'HeaderLines', k, 'ReturnOnError', false); % k = 3 error!!!
fclose(fid)

"k" should be 5 but the while loop stops at 3.

"D" exists but it is empty.

When try to peform textscan again before fclose, it also works but D is empty.

After fclose and 2nd fopen, textscan will show an error with k = 3 and D is not created.

In case of your file, textscan will return strange numbers and stops at k = 4.

It is unexpected behavior of textscan.

I will report your case to the development team in US.

As a workaround, could you please use my 1st code to find the ROW number?

k = 0;
while exist('D') ~= 1
    try
        D = dlmread('yourfile.txt', '', k, 0);
    catch
        k = k + 1;
    end
end

The numerical data start from 6th line.

"D" will contain numerical data.

"k" will be 5.

Thank you very much for your question and patience.

HTH

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 6

laurent jalabert 2018-11-30

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349754

在 MATLAB Online 中打开

Dear Maeda-san

the code below works well and solved my question. I deeply thank you for your answer, and your time to help me. It could be useful to many people who wants to read any data file containing variable headers lines.

k = 0;
while exist('D') ~= 1
    try
        D = dlmread('yourfile.txt', '', k, 0);
    catch
        k = k + 1;
    end
end

Now, to retrieve the headers, usually I use this kind of code,

RAW = importdata(filename,'\t',startRow);
M= RAW.data(:,1:21);
header = (RAW.textdata(1:startRow-1))';

In 1 data file, I found k=21, size(D) = [69735 21] , that means there are 21 columns of data, and 69735 lines. The headers are located from row =1 to row = 20. How can I get the headers ?

With importdata, it is quite easy as I show above with RAW.textdata function.

With dlmread, how is it ? I guess I should use textscan from row =1 to row =21 , isn't it ?

Yours

Laurent

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 7

Etsuo Maeda 2018-11-30

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/431901-detect-correct-startrow-in-fopen-before-textscan#answer_349769

Dear Laurent - san,

"dlmread" is a function to read numbers, not for characters.

So, it is impossible to read your header characters using "dlmread".

As you mentioned, "textscan or importdata again with determined k" is one of workaournds.

I think "readtable" is a good tool for you if you know number of the variables.

(The first question was try-catch problem. So I used try-catch statement in my answers before.)

clear; close all;
% R2016b and later
filename = 'yourfile.txt';
numOfVariables = 21;
opts = detectImportOptions(filename, 'Delimiter', '\t', 'NumVariables', numOfVariables);
T = readtable(filename, opts)
T.Properties.VariableNames

If you do not know the number of the variables, the following code may work but I cannot make any promise.

clear; close all;
% R2016b and later
filename = 'yourfile.txt';
opts = detectImportOptions(filename, 'Delimiter', '\t'); % remove NumVariables
T = readtable(filename, opts)
T.Properties.VariableNames

HTH

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

detect correct startRow in fopen before textscan

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（6 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

detect correct startRow in fopen before textscan

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（6 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论