How to extract column data from a csv file in vectors

Question

Katerina F 2014-9-18

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/155299-how-to-extract-column-data-from-a-csv-file-in-vectors

编辑： dpb 2014-9-19

EXTRAC1985.csv

Hello, I have csv files, here the data are similar to the extract of a file attached as an example. I want to extract the data of some of the columns in vectors and if there are empty values to be completed by NaN. I use the following code, but it needs modification to do what I need. How should I modify the code? thanks-k.

      fileID = fopen('EXTRAC1985.csv')
      C=textscan(fileID,'%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s','delimiter',',','EmptyValue',NaN);
     fclose(fileID)
     dat1= C{1}(3:end)
     dat2= C{10}(3:end)

the output is like: dat2 =

    '9'
    ''
    '9'
    ''
    '9'
    ''
    '9'
    ''
    '10' etc

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Guillaume 2014-9-18

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/155299-how-to-extract-column-data-from-a-csv-file-in-vectors#answer_152126

在 MATLAB Online 中打开

There are two problems:

There are 25 columns in your file and you've only specified 18 in your textscan. You need to read all 25 columns otherwise textscan gets confused.
EmptyValue does not apply to string fields which is what you've specified for all columns with %s. Note that you don't need 'EmptyValue', NaN since it's the default anyway.

You can either:

skip the file header and modify your textscan to read the data with the proper format:

fileID = fopen('EXTRAC1985.csv');
fgetl(fileID);   %skip first line
C=textscan(fileID,'%s %s %f %f %s %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f','delimiter',','); %use %f to specify numbers. Note that there are 25 %, not 18.
fclose(fid);
dat1 = C{:, 2}; dat2 = c{:, 10};

keep as is and use str2double to change the read data from string to number. str2double will replace empty strings with NaNs:

fileID = fopen('EXTRAC1985.csv')
C=textscan(fileID,'%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s','delimiter',','); % 25 %s
fclose(fid);
dat2 = str2double(c{10}(2:end));

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

dpb 2014-9-18

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/155299-how-to-extract-column-data-from-a-csv-file-in-vectors#answer_152128

编辑：dpb 2014-9-18

在 MATLAB Online 中打开

Don't say which columns are of interest, but you'll have much more success if you use format string for the types of data by column and match the number of fields to the number of columns in the file...there are a number of empty columns at the right it appears, but ignoring them in the format string causes the extra blank lines you've gotten.

>> fmt=['%s %s %f %f %s' repmat('%f',1,18)];
>> fid=fopen('extrac1985.csv');
>> c=textscan(fid,fmt,'delimiter',',','headerlines',1,'collectoutput',1);
>> whos c
Name      Size            Bytes  Class    Attributes
c         1x4             18192  cell

>> fid=fclose(fid);

textscan puts the sets of text and numeric together so the first column cell array is the first two text columns then the numeric id, the text and finally the remaining numeric array.

To select columns in the end, either--

a) read whole as above and then just set unwanted columns to [], or

b) modify the format string and skip unwanted columns by use of the '%*s' or '%*f' as appropriate in the format string to not return unwanted columns.

doc textscan

for more detail; look at the link to 'format options' for the gory details.

ADDENDUM

OK, to skip some columns I had a few minutes...get rid of the last number of empty columns and a few that are identical thruout--

>> fmt=['%s %*s %f %f %*s' ...
        repmat('%*f',1,3) repmat('%f',1,10) repmat('%*f',1,5)];
>> c=textscan(fid,fmt,'delimiter',',','headerlines',1,'collectoutput',1);
>> whos c
Name      Size            Bytes  Class    Attributes
c         1x2              8600  cell               
>> c{1}(1:10)
ans = 
  '01/01/1985 0:00'
  '01/01/1985 1:00'
  '01/01/1985 2:00'
  '01/01/1985 3:00'
  '01/01/1985 4:00'
  '01/01/1985 5:00'
  '01/01/1985 6:00'
  '01/01/1985 7:00'
  '01/01/1985 8:00'
  '01/01/1985 9:00'
>> c{2}(1:10,:)
ans =
Columns 1 through 9
    892702     1   360    9   NaN    17    NaN    9     9
    892702     1   360    9   NaN    16    NaN    9     9
    892702     1   360    9   NaN    14    NaN    9     9
    892702     1   350    9   NaN    15    NaN    9     9
    892702     1   360   10   NaN    15    NaN    9     9
    892702     1   360   10   NaN    16    NaN    9     9
    892702     1    10    9   NaN    16    NaN    9     9
    892702     1   360   10   NaN    15    NaN    9     9
    892702     1   360    9   NaN    15    NaN    9     9
    892702     1   360    9   NaN    16    NaN    9     9
Columns 10 through 12
         0           9           0
         0           9           0
         0           9           0
         0           9           0
         0           9           0
         0           9           0
         0           9           0
         0           9           0
         0           9           0
         0           9           0
>>

2 个评论
显示无隐藏无

Katerina F 2014-9-19

Hi, I am interested in keeping the columns A D F H J N. (I do not understand how the selection with the repmat works in this case, that is what you are doing with the line with the repmat, i.e. for example what the 1 and the 5 do in repmat('%*f',1,5)) etc. thanks,k

Note that I do not know the lenght of the data in the files (this file is just an example).

dpb 2014-9-19

编辑：dpb 2014-9-19

在 MATLAB Online 中打开

The repmat is simply replacing the explicit typing of N copies of the string with a counted number..."1,5" in this case is 1-row, 5 columns. See

doc repmat % for details

and/or simply try it at the command line and see what the result is.

...I do not know the length of the data in the files...

You mean rows or columns by "length" in this context? Number of rows is immaterial, the number of columns is more significant as the format string used controls the arrangement in the return cell array--having that mismatch caused all the extra blanks in your sample case you posted.

Mohammad's suggestion with the new table data structure may be convenient if you're using a recent-enough release (R2013+ I think; I am at R2012b here so don't have the full facility to see what it does with all the trailing empty columns).

It will, as far as I know, read the full array but it would be relatively easy as noted to select the columns wanted from that. OTOH, if you know the specific columns wanted, using the '%*' form of the format string for the unwanted columns will remove them from consideration at the git-go. You do need to know the actual number of columns total, however. If worst comes to worst, you can always read in the first line and count commas to ascertain that value.

请先登录，再进行评论。

How to extract column data from a csv file in vectors

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

2 个评论
显示无隐藏无

另请参阅

类别

标签

Community Treasure Hunt

How to extract column data from a csv file in vectors

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（1 个）

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无