Using textscan to skip specific rows

Question

Darren Kenney 2017-6-6

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/343581-using-textscan-to-skip-specific-rows

评论： Darren Kenney 2017-6-8

Right now I'm interested in using textscan to read specific rows of a .csv file which looks something like this:

 type    value    latency
 92      trigger  19593
 type    value    latency
 20      trigger  20000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000

I want to have textscan read the rows which have the values "92" and "19593", but to ignore the rows that are blank or the rows that have the words "type" "value" "latency". In addition, the documents I am scanning are of varying lengths, and I want textscan to automatically figure out which rows have the values of interest "92, 20, 14" and "19593" (latency values). How would I have textscan automatically scan the numerical values and ignore all other rows? In addition, I am not interested in scanning the column "value" at all. I only need to scan the two columns "type" and "latency".

Thank you in advance.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Stephen23 2017-6-8

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/343581-using-textscan-to-skip-specific-rows#answer_269989

编辑：Stephen23 2017-6-8

在 MATLAB Online 中打开

events file.csv

Second attempt: this time we will first convert the awful file format into a standard CSV format with one header line, no empty rows, no quoted numbers, and no repeated headers. I used regexprep and simply defined appropriate regular expressions:

fnm = 'events file.csv';
str = fileread(fnm);
rgx = {'\s*\n\s*"type"[a-z",]+','"(\d+)\s*"'};
rep = {'','$1'};
str = regexprep(str,rgx,rep);

You could also save str if you want an improved CSV file. Now we can easily extract the data from this improved CSV string:

fmt = '%f%*q%f%f%f';
opt = {'HeaderLines',1, 'CollectOutput',true, 'Delimiter',','};
C = textscan(str,fmt,opt{:});
M = C{1}

giving:

M =
        92       14287           0           1
        20       16459           0           2
        14       19147           0           3
        14       20646           0           4
        14       22047           0           5
        14       24431           0           6
        14       26844           0           7
        14       28819           0           8
        14       30071           0           9
        14       31843           0          10
        14       34788           0          11
        14       36883           0          12
        14       38439           0          13
        14       39571           0          14
        14       40691           0          15
        14       41368           0          16
        14       42443           0          17
        20       42708           0          18
        14       44955           0          19
        14       46092           0          20
        14       47771           0          21
        14       48632           0          22
        14       52780           0          23
        14       54312           0          24
        14       56616           0          25
        14       58724           0          26
        14       59831           0          27
        14       61911           0          28
        14       62859           0          29
        14       64884           0          30
        14       67534           0          31
        14       68367           0          32
        14       70052           0          33
        20       70311           0          34
        14       72450           0          35
        14       73476           0          36
        14       75143           0          37
        14       76615           0          38
        20       76622           0          39

Tested on this file:

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Darren Kenney 2017-6-8

This worked. Thank you very much!

请先登录，再进行评论。

Answer 2

Stephen23 2017-6-6

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/343581-using-textscan-to-skip-specific-rows#answer_269785

编辑：Stephen23 2017-6-7

在 MATLAB Online 中打开

temp2.txt

One approach is to play with the EndOfLine character (a little hacky, I know):

opt = {'MultipleDelimsAsOne',true, 'EndOfLine','p',...
    'CollectOutput',true, 'HeaderLines',1};
fmt = '%*s%*s%*s%f%*s%f';
fid = fopen('temp2.txt','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1}(1:2:end,:);

giving:

>> M
M =
        92       19593
        20       20000
        14       21000
        14       21000
        14       21000
        14       21000
        14       21000
        14       21000
        14       21000

As long as the file is not huge then this will be a reasonably effective method, although it does create a larger output matrix than required.

Note that your comment that the file "looks something like this" and the misleading "a .csv file" (whereas there is not a single comma anywhere in your example data) means that you are not providing an accurate enough specification. Reading files requires a very precise specification: every newline, every separator, non-printable characters, trailing spaces, trailing delimiters, repeated space characters, etc, etc, makes a huge difference to how a file is read. Because you did not provide us with a sample file I created my own one to test my code on, available here:

9 个评论
显示 7更早的评论隐藏 7更早的评论

Stephen23 2017-6-7

编辑：Stephen23 2017-6-7

在 MATLAB Online 中打开

So, having looked at your file, it turns out that your description was incorrect after all. In particular:

All of the strings and some of the numeric values are double quoted, which in the world of CSV files indicates strings (and particular strings that might contain spaces or commas).
It has a different number of columns.
It has trailing commas (a trailing empty field).
Oh, and the "blank" lines do not contain any commas at all (no field delimiters).

Well, one I had the real file to work with it was easy to make my code work. Keep in mind that this is a vary hacky solution, and what you really should do is improve whatever it is that writes that awful file. The best solution is to fix the problem at the source (bad file formatting), rather than trying to mess around and fix it later with hack code.

PS: do not put spaces in the format string.

opt = {'MultipleDelimsAsOne',true, 'CollectOutput',true,...
    'EndOfLine','p', 'HeaderLines',1, 'Delimiter','\n\r\t ,"'};
fmt = '%*s%*s%*s%*s%*s%s%*s%f%f%f';
fid = fopen('events file.csv','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{2}(1:2:end,:);
S = C{1}(1:2:end,:);
Z = [str2double(S),M]

giving:

Z =
          92       14287           0           1
          20       16459           0           2
          14       19147           0           3
          14       20646           0           4
          14       22047           0           5
          14       24431           0           6
          14       26844           0           7
          14       28819           0           8
          14       30071           0           9
          14       31843           0          10
          14       34788           0          11
          14       36883           0          12
          14       38439           0          13
          14       39571           0          14
          14       40691           0          15
          14       41368           0          16
          14       42443           0          17
          20       42708           0          18
          14       44955           0          19
          14       46092           0          20
          14       47771           0          21
          14       48632           0          22
          14       52780           0          23
          14       54312           0          24
          14       56616           0          25
          14       58724           0          26
          14       59831           0          27
          14       61911           0          28
          14       62859           0          29
          14       64884           0          30
          14       67534           0          31
          14       68367           0          32
          14       70052           0          33
          20       70311           0          34
          14       72450           0          35
          14       73476           0          36
          14       75143           0          37
          14       76615           0          38
          20       76622           0          39

Darren Kenney 2017-6-7

Not large enough to crash the program (40 lines for each file), and 2012. I fixed the problem by changing the 'delimiter' from '\n\r\t ,"' to ','

Darren Kenney 2017-6-7

Although, now the leftmost column displays all numbers as NaN which is what I assume you were trying to work around with '\n\r\t ,"'?

请先登录，再进行评论。

Using textscan to skip specific rows

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

9 个评论
显示 7更早的评论隐藏 7更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Using textscan to skip specific rows

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

9 个评论 显示 7更早的评论隐藏 7更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

9 个评论
显示 7更早的评论隐藏 7更早的评论