Read all the columns in a .csv file

Question

Damith 2014-10-29

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/160596-read-all-the-columns-in-a-csv-file

评论： Damith 2014-11-19

Hi,

I have a .csv file with the following columns, I need to read all the columns. What function I need to use. I tried csvread but it did not work.

KH  110427  PH  M  1951-01-01T07:00:00+07:00  0  mm  O
KH  110427  PH  M  1951-01-02T07:00:00+07:00  0  mm  O
KH  110427  PH  M  1951-01-03T07:00:00+07:00  0  mm  O
 .
 .
 .

Any ideas?

Thanks.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Star Strider 2014-10-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/160596-read-all-the-columns-in-a-csv-file#answer_157050

I would experiment with textscan or perhaps fscanf. Not having the file I can’t be more specific.

18 个评论
显示 16更早的评论隐藏 16更早的评论

Star Strider 2014-10-29

在 MATLAB Online 中打开

My pleasure.

Let’s concentrate on this file first. There’s a separate — and often-cited — thread on that topic in FAQ: How can I process a sequence of files?

What constitutes a ‘complete year’? For instance, the first rows that meet the col-6-non-negative criterion are:

      'KH'    [110.4410e+003]    'PH'    'M'    '1998-01-01T07:00...'    [0.0000e+000]    'mm'    'O'
      'KH'    [110.4410e+003]    'PH'    'M'    '1998-01-02T07:00...'    [0.0000e+000]    'mm'    'O'
      'KH'    [110.4410e+003]    'PH'    'M'    '1998-01-03T07:00...'    [0.0000e+000]    'mm'    'O'
      'KH'    [110.4410e+003]    'PH'    'M'    '1998-01-04T07:00...'    [0.0000e+000]    'mm'    'O'
      'KH'    [110.4410e+003]    'PH'    'M'    '1998-01-05T07:00...'    [0.0000e+000]    'mm'    'O'

So it probably shouldn’t be difficult to convert the dates and times to date vectors if necessary.

What constitutes a ‘complete year’? Do we just search for the first and last dates of a given year, or does it have to start and end at specific dates and times?

I’m converting the table to a cell array, since I have more experience with cell arrays and functions than tables. I’ll post my code later.

Star Strider 2014-10-30

在 MATLAB Online 中打开

I still have no idea what you want to do with your year data, so I opted to filter out the invalid years and write each valid year to a separate cell in the output array.

I ended up creating a relatively efficient (for me) routine that detects the ‘valid’ years (beginning on 01-Jan and ending on 31-Dec, regardless of the number of days between them), and writes those complete years’ data to the cell array ‘yrout’. (It works for this file, but I can’t determine how robust it is.) You can determine the data in ‘yrout’ you want to write to a separate array if you don’t want to keep all of it. I did my best to comment-document it, so understanding its function should be straightforward.

To process several files, you need to refer to the FAQ both Image Analyst and I have linked to. You might want to wrap my file in a function file that takes the file data (or file names if you want to use my readtable call) as input, and produces the edited data you want as output, and call it for each file you read.

My code:

tr = readtable('Damith_test.csv','ReadVariableNames',0);    % Load Data
% td = tr(1:5,:)                           % Diagnostic Write
isnneg = @(x) x>=0;                                         % Function
tc = table2cell(tr);
valrow = cellfun(isnneg,tc(:,6));                           % Col #6 >= 0
tcval = tc(valrow,:);                                       % Logical Vector
% tcvq = tcval(1:5,:)                     % Diagnostic Write
tcdn = datenum(tcval(:,5), 'yyyy-mm-ddTHH:MM:SS');          % Create Date Numbers
tcdv = datevec(tcdn);                                       % Create Date Vectors
% tcdq = tcdv(1:5,:);                     % Diagnostic Write
[uy,days,~] = unique(tcdv(:,1));                            % Years In File
dend = diff([days; length(tcdn)]);                          % Lengths Of Years In File
yrbgn = tcdv(days,:);                                       % First Days Of Years
yrend = tcdv([days(2:end)-1; length(tcdn)],:);              % Last Days Of Years
yrvld1 = find((yrbgn(:,2) ==  1) & (yrbgn(:,3) ==  1));     % Valid Year Starts
yrvld2 = find((yrend(:,2) == 12) & (yrend(:,3) == 31));     % Valid Year Ends
yrvldix = yrvld1(ismember(yrvld1, yrvld2));                 % Valid Years
yrvldds = days(yrvldix);                                    % #Days In Valid Years
for k1 = 1:length(yrvldix)                                  % Create Output Year Data
    yrout{k1} = tcval(days(yrvldix(k1)):days(yrvldix(k1))+dend(yrvldix(k1))-1, :);
end

I kept in my commented-out % Diagnostic Write statements in case you want to see those data.

Damith 2014-11-4

编辑：Damith 2014-11-4

在 MATLAB Online 中打开

Thanks Star Rider. That's exactly what I wanted.

I have the following code below to read all the csv files in myFolder.

clear all
tic
cd ('<path1>')
myFolder = '<path2>';
if ~isdir(myFolder)
  errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
  uiwait(warndlg(errorMessage));
  return;
end
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
  fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
  out{k} = textscan(fid(k),'%s%s%f','delimiter','\t');
  fclose(fid(k));
end

But the output "out" of one cell for a csv file looks this this:

'KH,100401,PH,M,1920-01-01T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-02T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-03T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-04T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-05T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-06T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-07T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-08T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-09T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-10T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-11T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-12T07:00:00+07:00,-9999.00,mm,O'

How can I make this output to look similar to above posting (separate to columns removing the commas) so that your code works. Please see the atatched files.

Can somebody help me?

Star Strider 2014-11-4

在 MATLAB Online 中打开

My pleasure!

I’m glad it’s what you want.

This works when I run it:

D = {'KH,100401,PH,M,1920-01-01T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-02T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-03T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-04T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-05T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-06T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-07T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-08T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-09T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-10T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-11T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-12T07:00:00+07:00,-9999.00,mm,O'};
celsplt = @(x) strsplit(x, ',' ,'CollapseDelimiters',1);
Ds = cellfun(celsplt,D, 'Uni',0);
Ds{1}                               % Sample Output

produces:

ans = 
      'KH'    '100401'    'PH'    'M'    '1920-01-01T07:00...'    '-9999.00'    'mm'    'O'

The sample output for the first line looks like it provides just what you illustrated.

Star Strider 2014-11-4

编辑：Star Strider 2014-11-4

在 MATLAB Online 中打开

My pleasure!

You shouldn’t have any trouble using the vertcat function in the loop using the code I provided. If you only want to save ‘Dt’ then don’t subscript ‘out’ (unless not subscripting it doesn’t work in your application) or ‘Ds’, just use it as in the code I provided and subscript ‘Dt{k}’ instead. At the end of your loop, save ‘Dt’ to a .mat-file. Then you won’t have to read the .csv files each time. You can also put my ‘celsplt’ function before the loop. Once it’s defined in your code, it will be available within the loop for the cellfun call.

This minor revision of your for loop should work:

celsplt = @(x) strsplit(x, ',' ,'CollapseDelimiters',1);
for k = 1:length(csvFiles)
  fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
  out{k} = textscan(fid(k),'%s%s%f','delimiter','\t');
  Ds = cellfun(celsplt,out{1,k}{1,1}, 'Uni',0);
  Dt{k} = vertcat(Ds{:});  
  fclose(fid(k));
end
% ————— The ‘save’ statement for ‘Dt’ goes here —————

Damith 2014-11-4

编辑：Damith 2014-11-4

在 MATLAB Online 中打开

Thanks it worked.

Now, can you please modify to include the code you wrote in a for loop. Instead Tr, it should read from Dt{1,k}. I need some help to complete this for loop.

% tr = readtable('test.csv','ReadVariableNames',0);    % Load Data
% td = tr(1:5,:)                           % Diagnostic Write
for k=1:length(k)
isnneg = @(x) x>=0;                                         % Function
tc{1,k} = table2cell(Dt{1,k});
valrow{1,k} = cellfun(isnneg,tc{1,k}(:,6));                 % Col #6 >= 0
tcval = tc(valrow,:);                                       % Logical Vector
% tcvq = tcval(1:5,:)                     % Diagnostic Write
tcdn = datenum(tcval(:,5), 'yyyy-mm-ddTHH:MM:SS');          % Create Date Numbers
tcdv = datevec(tcdn);                                       % Create Date Vectors
% tcdq = tcdv(1:5,:);                     % Diagnostic Write
[uy,days,~] = unique(tcdv(:,1));                            % Years In File
dend = diff([days; length(tcdn)]);                          % Lengths Of Years In File
yrbgn = tcdv(days,:);                                       % First Days Of Years
yrend = tcdv([days(2:end)-1; length(tcdn)],:);              % Last Days Of Years
yrvld1 = find((yrbgn(:,2) ==  1) & (yrbgn(:,3) ==  1));     % Valid Year Starts
yrvld2 = find((yrend(:,2) == 12) & (yrend(:,3) == 31));     % Valid Year Ends
yrvldix = yrvld1(ismember(yrvld1, yrvld2));                 % Valid Years
yrvldds = days(yrvldix);                                    % #Days In Valid Years
for k1 = 1:length(yrvldix)                                  % Create Output Year Data
    yrout{k1} = tcval(days(yrvldix(k1)):days(yrvldix(k1))+dend(yrvldix(k1))-1, :);
end

Thanks.

Star Strider 2014-11-4

You have to do the loop and integrate my code into it. I have no idea what you want to do or in what order you want to do it.

Damith 2014-11-19

在 MATLAB Online 中打开

test.csv

I wanted to modify the code below to show the mm/dd/yyyy in first column if col 6 of each cell has complete data set (i.e. if number of days per year = 365 or 366) but not incomplete years and col 6 values of each cell in second column in a diferent cell array "newyr".

Do you have any idea?

Thanks in advance again.

tr = readtable('test.csv','ReadVariableNames',0);    % Load Data
% td = tr(1:5,:)                           % Diagnostic Write
isnneg = @(x) x>=0;                                         % Function
tc = table2cell(tr);
valrow = cellfun(isnneg,tc(:,6));                           % Col #6 >= 0
tcval = tc(valrow,:);                                       % Logical Vector
% tcvq = tcval(1:5,:)                     % Diagnostic Write
tcdn = datenum(tcval(:,5), 'yyyy-mm-ddTHH:MM:SS');          % Create Date Numbers
tcdv = datevec(tcdn);                                       % Create Date Vectors
% tcdq = tcdv(1:5,:);                     % Diagnostic Write
[uy,days,~] = unique(tcdv(:,1));                            % Years In File
dend = diff([days; length(tcdn)]);                          % Lengths Of Years In File
yrbgn = tcdv(days,:);                                       % First Days Of Years
yrend = tcdv([days(2:end)-1; length(tcdn)],:);              % Last Days Of Years
yrvld1 = find((yrbgn(:,2) ==  1) & (yrbgn(:,3) ==  1));     % Valid Year Starts
yrvld2 = find((yrend(:,2) == 12) & (yrend(:,3) == 31));     % Valid Year Ends
yrvldix = yrvld1(ismember(yrvld1, yrvld2));                 % Valid Years
yrvldds = days(yrvldix);                                    % #Days In Valid Years
for k1 = 1:length(yrvldix)                                  % Create Output Year Data
    yrout{k1} = tcval(days(yrvldix(k1)):days(yrvldix(k1))+dend(yrvldix(k1))-1, :);
end

请先登录，再进行评论。

Answer 2

Image Analyst 2014-10-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/160596-read-all-the-columns-in-a-csv-file#answer_157072

在 MATLAB Online 中打开

If you make the first row a line of tab separated names for the columns....

col1  col2  col3  col4  col5  col6  col7  col8
KH  110427  PH  M  1951-01-01T07:00:00+07:00  0  mm  O
KH  110427  PH  M  1951-01-02T07:00:00+07:00  0  mm  O
KH  110427  PH  M  1951-01-03T07:00:00+07:00  0  mm  O

Then you could use readtable() which gives a table rather than a cell array. I find tables are easier to use than cell arrays where you always have to figure out whether you want to use braces or parentheses.

t=readtable('test2.csv', 'delimiter', '\t')

When I ran it.....

>> test2
t = 
    col1       col2       col3    col4               col5                col6    col7    col8
    ____    __________    ____    ____    ___________________________    ____    ____    ____
    'KH'    1.1043e+05    'PH'    'M'     '1951-01-01T07:00:00+07:00'    0       'mm'    'O' 
    'KH'    1.1043e+05    'PH'    'M'     '1951-01-02T07:00:00+07:00'    0       'mm'    'O' 
    'KH'    1.1043e+05    'PH'    'M'     '1951-01-03T07:00:00+07:00'    0       'mm'    'O'

8 个评论
显示 6更早的评论隐藏 6更早的评论

Damith 2014-10-30

Thanks again Image Analyst. All csv files I received from an agency and they have created the csv files. All I need to read the files, which I have a code to read multiple csv files but it does not read as in the structure shown in test.csv. But the code you provided earler works fine the way I want but your test2.csv file is different from my .csv files. I dont know how you converted from test.csv to test2.csv.

Image Analyst 2014-10-30

I just did it manually in the text editor. If you wanted to ad a preprocessing step where you open all the files and add that headerline, you could easily do that with the code in the FAQ: http://matlab.wikia.com/wiki/FAQ#How_can_I_process_a_sequence_of_files.3F

请先登录，再进行评论。

Read all the columns in a .csv file

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

18 个评论
显示 16更早的评论隐藏 16更早的评论

更多回答（1 个）

8 个评论
显示 6更早的评论隐藏 6更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Read all the columns in a .csv file

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

18 个评论 显示 16更早的评论隐藏 16更早的评论

更多回答（1 个）

8 个评论 显示 6更早的评论隐藏 6更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

18 个评论
显示 16更早的评论隐藏 16更早的评论

8 个评论
显示 6更早的评论隐藏 6更早的评论