Ignore format errors using textscan

Question

Marcel 2015-1-12

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/169665-ignore-format-errors-using-textscan

评论： Stephen23 2015-1-12

I been using textscan to read large text files containing data. Here's some lines to get an idea of what it looks like:

2015,1,1,23,1,23,100,9034
2015,1,1,23,1,23,203,8940
2015,1,1,23,1,23,313,8807

There's several million lines in the .txt file and every now and then there's a small error. Some strange symbols in the line or missing data (early end) of a line and very inconsistent. Whenever textscan comes across one of these lines it gives me the following error;

Error using horzcat
Dimensions of matrices being concatenated are not consistent.

I would like it to ignore these lines and continue reading all data. Anybody can give me advice on what I can do best?

Thanks!

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Stephen23 2015-1-12

The error message that you posted does not refer to textscan, but rather to horzcat...

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Star Strider 2015-1-12

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/169665-ignore-format-errors-using-textscan#answer_164667

在 MATLAB Online 中打开

You don’t supply a lot of specific information, so I can’t provide a specific answer.

You can deal with the missing data fields by specifying a value for the 'EmptyValue' parameter. See the section of textscan ‘Name-Value Pair Arguments’ under 'EmptyValue'.

Stopping at a string character may be more difficult, because you do not specify what the non-numeric values are. An example of one way to deal with that is (with ‘fidi’ being the input file ID):

D1 = textscan(fidi, '%f %f', 'HeaderLines',2, 'Delimiter','\n', 'CollectOutput',1);
fseek(fidi,0,0);                % Position Start Of Second Part Of File
D2 = textscan(fidi, '%f %f', 'HeaderLines',2, 'Delimiter','\n', 'CollectOutput',1);

This instructs textscan to read to the first interruption, the start itself again and read through the rest of the file. (This example is from my archived code. In this instance, the file had only one header in the middle of the file, with the same number of header lines as the beginning of the file. You can combine the ‘D1’ and ‘D2’ variables here, or keep them separate, depending on the nature of your data.)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Marcel 2015-1-12

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/169665-ignore-format-errors-using-textscan#answer_164676

编辑：Marcel 2015-1-12

在 MATLAB Online 中打开

Thanks. Apparently the error didn't happen using textscan, but a function afterwards. Thanks to your info I was able to figure out a way to ignore invalid lines and continue reading the .txt file.

For anybody curious;

a1 = [];
while feof(fidi) == 0
    a2 = textscan(fidi,format,'delimiter',',','HeaderLines',1,'CollectOutput',1);
    a1 = [a1;a2{1}];
end

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Star Strider 2015-1-12

My pleasure!

Yours is a new approach — at least not one I’ve seen before — so +1 for your Answer and +1 your Question.

请先登录，再进行评论。

Ignore format errors using textscan

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

Ignore format errors using textscan

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

WeChat

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论