Read all the columns in a .csv file
16 次查看(过去 30 天)
显示 更早的评论
Damith
2014-10-29
Hi,
I have a .csv file with the following columns, I need to read all the columns. What function I need to use. I tried csvread but it did not work.
KH 110427 PH M 1951-01-01T07:00:00+07:00 0 mm O
KH 110427 PH M 1951-01-02T07:00:00+07:00 0 mm O
KH 110427 PH M 1951-01-03T07:00:00+07:00 0 mm O
.
.
.
Any ideas?
Thanks.
采纳的回答
Star Strider
2014-10-29
I would experiment with textscan or perhaps fscanf. Not having the file I can’t be more specific.
18 个评论
Star Strider
2014-10-29
[d,t,r] = xlsread('test.csv');
I would use that, since I was able to read your file using that line of code. It returns three results, here ‘d’ are the data, ‘t’ are the text headers, and ‘r’ are all the raw data as read in by the function (before parsing).
I didn’t suggest xlsread earlier because I didn’t know it was an Excel file.
Damith
2014-10-29
Thanks.
But, how can i read 496 of .csv files using xlsread function.? I also need to do a filtering of col 6 values. If col 6 contains non-negative values for a corresponding complete year, I need to catenate the complete years with the col 6 values.
Star Strider
2014-10-29
My pleasure.
Let’s concentrate on this file first. There’s a separate — and often-cited — thread on that topic in FAQ: How can I process a sequence of files?
What constitutes a ‘complete year’? For instance, the first rows that meet the col-6-non-negative criterion are:
'KH' [110.4410e+003] 'PH' 'M' '1998-01-01T07:00...' [0.0000e+000] 'mm' 'O'
'KH' [110.4410e+003] 'PH' 'M' '1998-01-02T07:00...' [0.0000e+000] 'mm' 'O'
'KH' [110.4410e+003] 'PH' 'M' '1998-01-03T07:00...' [0.0000e+000] 'mm' 'O'
'KH' [110.4410e+003] 'PH' 'M' '1998-01-04T07:00...' [0.0000e+000] 'mm' 'O'
'KH' [110.4410e+003] 'PH' 'M' '1998-01-05T07:00...' [0.0000e+000] 'mm' 'O'
So it probably shouldn’t be difficult to convert the dates and times to date vectors if necessary.
What constitutes a ‘complete year’? Do we just search for the first and last dates of a given year, or does it have to start and end at specific dates and times?
I’m converting the table to a cell array, since I have more experience with cell arrays and functions than tables. I’ll post my code later.
Damith
2014-10-30
Reading all the columns in each csv file and read all the csv files inside folder and store in cell array is the first challenge.
Then, need to check whether data in col 6 is complete for a given year. For example, if data is non-negative in col 6 for year 1998, first column on "data" matrix should show dates from 1/1/1998 to 12/31/1998 and col 6 data should be pasted in col 2 of "data" matrix and so on.
Hope I explained clearly.
Star Strider
2014-10-30
So column #5 — the dates — becomes column #1 in the output and column #6 — some sort of flag it seems — becomes column #2 of the output, but both only for full years, and no other data are put into the output array.
We’ll do this file, then you can loop through your files in a for loop and do the same with them.
Star Strider
2014-10-30
I still have no idea what you want to do with your year data, so I opted to filter out the invalid years and write each valid year to a separate cell in the output array.
I ended up creating a relatively efficient (for me) routine that detects the ‘valid’ years (beginning on 01-Jan and ending on 31-Dec, regardless of the number of days between them), and writes those complete years’ data to the cell array ‘yrout’. (It works for this file, but I can’t determine how robust it is.) You can determine the data in ‘yrout’ you want to write to a separate array if you don’t want to keep all of it. I did my best to comment-document it, so understanding its function should be straightforward.
To process several files, you need to refer to the FAQ both Image Analyst and I have linked to. You might want to wrap my file in a function file that takes the file data (or file names if you want to use my readtable call) as input, and produces the edited data you want as output, and call it for each file you read.
My code:
tr = readtable('Damith_test.csv','ReadVariableNames',0); % Load Data
% td = tr(1:5,:) % Diagnostic Write
isnneg = @(x) x>=0; % Function
tc = table2cell(tr);
valrow = cellfun(isnneg,tc(:,6)); % Col #6 >= 0
tcval = tc(valrow,:); % Logical Vector
% tcvq = tcval(1:5,:) % Diagnostic Write
tcdn = datenum(tcval(:,5), 'yyyy-mm-ddTHH:MM:SS'); % Create Date Numbers
tcdv = datevec(tcdn); % Create Date Vectors
% tcdq = tcdv(1:5,:); % Diagnostic Write
[uy,days,~] = unique(tcdv(:,1)); % Years In File
dend = diff([days; length(tcdn)]); % Lengths Of Years In File
yrbgn = tcdv(days,:); % First Days Of Years
yrend = tcdv([days(2:end)-1; length(tcdn)],:); % Last Days Of Years
yrvld1 = find((yrbgn(:,2) == 1) & (yrbgn(:,3) == 1)); % Valid Year Starts
yrvld2 = find((yrend(:,2) == 12) & (yrend(:,3) == 31)); % Valid Year Ends
yrvldix = yrvld1(ismember(yrvld1, yrvld2)); % Valid Years
yrvldds = days(yrvldix); % #Days In Valid Years
for k1 = 1:length(yrvldix) % Create Output Year Data
yrout{k1} = tcval(days(yrvldix(k1)):days(yrvldix(k1))+dend(yrvldix(k1))-1, :);
end
I kept in my commented-out % Diagnostic Write statements in case you want to see those data.
Damith
2014-11-4
编辑:Damith
2014-11-4
Thanks Star Rider. That's exactly what I wanted.
I have the following code below to read all the csv files in myFolder.
clear all
tic
cd ('<path1>')
myFolder = '<path2>';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
out{k} = textscan(fid(k),'%s%s%f','delimiter','\t');
fclose(fid(k));
end
But the output "out" of one cell for a csv file looks this this:
'KH,100401,PH,M,1920-01-01T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-02T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-03T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-04T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-05T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-06T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-07T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-08T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-09T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-10T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-11T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-12T07:00:00+07:00,-9999.00,mm,O'
How can I make this output to look similar to above posting (separate to columns removing the commas) so that your code works. Please see the atatched files.
Can somebody help me?
Star Strider
2014-11-4
My pleasure!
I’m glad it’s what you want.
This works when I run it:
D = {'KH,100401,PH,M,1920-01-01T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-02T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-03T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-04T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-05T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-06T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-07T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-08T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-09T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-10T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-11T07:00:00+07:00,-9999.00,mm,O'
'KH,100401,PH,M,1920-01-12T07:00:00+07:00,-9999.00,mm,O'};
celsplt = @(x) strsplit(x, ',' ,'CollapseDelimiters',1);
Ds = cellfun(celsplt,D, 'Uni',0);
Ds{1} % Sample Output
produces:
ans =
'KH' '100401' 'PH' 'M' '1920-01-01T07:00...' '-9999.00' 'mm' 'O'
The sample output for the first line looks like it provides just what you illustrated.
Damith
2014-11-4
Thanks for the quick reply again.
What I want is columnwise separation? not rowwise separation. In other words, all KH in first column, 100401 in second column and so on.
I appreciate if you can suggest me a way inside the for loop.
Thanks again.
Star Strider
2014-11-4
I doubt a loop is necessary. The vertcat function will work:
celsplt = @(x) strsplit(x, ',' ,'CollapseDelimiters',1);
Ds = cellfun(celsplt,D, 'Uni',0);
Dt = vertcat(Ds{:}); % Vertically Concatanate
Dt1 = Dt(:,1) % Display Column #1
produces:
Dt1 =
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
'KH'
Damith
2014-11-4
Thanks. It produced desired result.
But I need to do the same thing for multiple .csv files. So, how I include this in the for loop. But I dont know how to include the vertcat function in the for loop.
Do you have any idea?
cd ('<path1>')
myFolder = '<path2>';
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
filePattern = fullfile(myFolder, '*.csv');
csvFiles = dir(filePattern);
for k = 1:length(csvFiles)
fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
out{k} = textscan(fid(k),'%s%s%f','delimiter','\t');
celsplt = @(x) strsplit(x, ',' ,'CollapseDelimiters',1);
Ds{1,k} = cellfun(celsplt,out{1,k}{1,1}, 'Uni',0);
% Dt{1,k} = vertcat(Ds{:});
fclose(fid(k));
end
Star Strider
2014-11-4
编辑:Star Strider
2014-11-4
My pleasure!
You shouldn’t have any trouble using the vertcat function in the loop using the code I provided. If you only want to save ‘Dt’ then don’t subscript ‘out’ (unless not subscripting it doesn’t work in your application) or ‘Ds’, just use it as in the code I provided and subscript ‘Dt{k}’ instead. At the end of your loop, save ‘Dt’ to a .mat-file. Then you won’t have to read the .csv files each time. You can also put my ‘celsplt’ function before the loop. Once it’s defined in your code, it will be available within the loop for the cellfun call.
This minor revision of your for loop should work:
celsplt = @(x) strsplit(x, ',' ,'CollapseDelimiters',1);
for k = 1:length(csvFiles)
fid(k) = fopen(fullfile(myFolder,csvFiles(k).name));
out{k} = textscan(fid(k),'%s%s%f','delimiter','\t');
Ds = cellfun(celsplt,out{1,k}{1,1}, 'Uni',0);
Dt{k} = vertcat(Ds{:});
fclose(fid(k));
end
% ————— The ‘save’ statement for ‘Dt’ goes here —————
Damith
2014-11-4
编辑:Damith
2014-11-4
Thanks it worked.
Now, can you please modify to include the code you wrote in a for loop. Instead Tr, it should read from Dt{1,k}. I need some help to complete this for loop.
% tr = readtable('test.csv','ReadVariableNames',0); % Load Data
% td = tr(1:5,:) % Diagnostic Write
for k=1:length(k)
isnneg = @(x) x>=0; % Function
tc{1,k} = table2cell(Dt{1,k});
valrow{1,k} = cellfun(isnneg,tc{1,k}(:,6)); % Col #6 >= 0
tcval = tc(valrow,:); % Logical Vector
% tcvq = tcval(1:5,:) % Diagnostic Write
tcdn = datenum(tcval(:,5), 'yyyy-mm-ddTHH:MM:SS'); % Create Date Numbers
tcdv = datevec(tcdn); % Create Date Vectors
% tcdq = tcdv(1:5,:); % Diagnostic Write
[uy,days,~] = unique(tcdv(:,1)); % Years In File
dend = diff([days; length(tcdn)]); % Lengths Of Years In File
yrbgn = tcdv(days,:); % First Days Of Years
yrend = tcdv([days(2:end)-1; length(tcdn)],:); % Last Days Of Years
yrvld1 = find((yrbgn(:,2) == 1) & (yrbgn(:,3) == 1)); % Valid Year Starts
yrvld2 = find((yrend(:,2) == 12) & (yrend(:,3) == 31)); % Valid Year Ends
yrvldix = yrvld1(ismember(yrvld1, yrvld2)); % Valid Years
yrvldds = days(yrvldix); % #Days In Valid Years
for k1 = 1:length(yrvldix) % Create Output Year Data
yrout{k1} = tcval(days(yrvldix(k1)):days(yrvldix(k1))+dend(yrvldix(k1))-1, :);
end
Thanks.
Star Strider
2014-11-4
I’m lost. I don’t understand what you want to do so I can’t integrate my code with yours. You’ll have to do the initial integration, then experiment with it, and explain to me what you’re doing. I’ll help you get it running if I can.
You’re reading in a series of files but storing them as cells rather than as tables, so the ‘table2cell’ call is not necessary and will likely throw an error.
I believe this assignment simply needs to be stated this way to fit with my code:
tc{k} = Dt{k};
Beyond that I have no suggestions at present.
Damith
2014-11-4
编辑:Damith
2014-11-4
OK. I will explain.
I need to check whether data in col 6 of Dt{1,1}, Dt{1,2} upto all the cells (all the files) is complete for a given year. For example, if data is non-negative in col 6 for year 1998, first column on "data" matrix should show dates from 1/1/1998 to 12/31/1998 and col 6 data should be pasted in col 2 of "data" matrix and so on.
You have done it in your code for a single .csv file. But now it is a matter of reading from Dt{1,k} cells of Dt.
Hope I explained clearly.
Star Strider
2014-11-4
You have to do the loop and integrate my code into it. I have no idea what you want to do or in what order you want to do it.
Damith
2014-11-19
I wanted to modify the code below to show the mm/dd/yyyy in first column if col 6 of each cell has complete data set (i.e. if number of days per year = 365 or 366) but not incomplete years and col 6 values of each cell in second column in a diferent cell array "newyr".
Do you have any idea?
Thanks in advance again.
tr = readtable('test.csv','ReadVariableNames',0); % Load Data
% td = tr(1:5,:) % Diagnostic Write
isnneg = @(x) x>=0; % Function
tc = table2cell(tr);
valrow = cellfun(isnneg,tc(:,6)); % Col #6 >= 0
tcval = tc(valrow,:); % Logical Vector
% tcvq = tcval(1:5,:) % Diagnostic Write
tcdn = datenum(tcval(:,5), 'yyyy-mm-ddTHH:MM:SS'); % Create Date Numbers
tcdv = datevec(tcdn); % Create Date Vectors
% tcdq = tcdv(1:5,:); % Diagnostic Write
[uy,days,~] = unique(tcdv(:,1)); % Years In File
dend = diff([days; length(tcdn)]); % Lengths Of Years In File
yrbgn = tcdv(days,:); % First Days Of Years
yrend = tcdv([days(2:end)-1; length(tcdn)],:); % Last Days Of Years
yrvld1 = find((yrbgn(:,2) == 1) & (yrbgn(:,3) == 1)); % Valid Year Starts
yrvld2 = find((yrend(:,2) == 12) & (yrend(:,3) == 31)); % Valid Year Ends
yrvldix = yrvld1(ismember(yrvld1, yrvld2)); % Valid Years
yrvldds = days(yrvldix); % #Days In Valid Years
for k1 = 1:length(yrvldix) % Create Output Year Data
yrout{k1} = tcval(days(yrvldix(k1)):days(yrvldix(k1))+dend(yrvldix(k1))-1, :);
end
更多回答(1 个)
Image Analyst
2014-10-29
If you make the first row a line of tab separated names for the columns....
col1 col2 col3 col4 col5 col6 col7 col8
KH 110427 PH M 1951-01-01T07:00:00+07:00 0 mm O
KH 110427 PH M 1951-01-02T07:00:00+07:00 0 mm O
KH 110427 PH M 1951-01-03T07:00:00+07:00 0 mm O
Then you could use readtable() which gives a table rather than a cell array. I find tables are easier to use than cell arrays where you always have to figure out whether you want to use braces or parentheses.
t=readtable('test2.csv', 'delimiter', '\t')
When I ran it.....
>> test2
t =
col1 col2 col3 col4 col5 col6 col7 col8
____ __________ ____ ____ ___________________________ ____ ____ ____
'KH' 1.1043e+05 'PH' 'M' '1951-01-01T07:00:00+07:00' 0 'mm' 'O'
'KH' 1.1043e+05 'PH' 'M' '1951-01-02T07:00:00+07:00' 0 'mm' 'O'
'KH' 1.1043e+05 'PH' 'M' '1951-01-03T07:00:00+07:00' 0 'mm' 'O'
8 个评论
Damith
2014-10-29
Thanks Image Analyst.
When I ran it, it did not separate out for columns in t.
How can modify this to read multiple .csv files using the readtable function?
Image Analyst
2014-10-29
If you add the column names row, it does work. Try the line of code
t=readtable('test2.csv', 'delimiter', '\t')
on the attached csv file. It WILL work (assuming you have R2013b or later which is required for tables).
Damith
2014-10-29
Thanks.
But the input csv files looks like the output. You might have arranged the input file to look like test2.csv. But all .csv files does look like output (t).
That means I have to arrange the input test2.csv file like yours. How did you perform that task?
Image Analyst
2014-10-29
You said "I have a .csv file with the following columns" so I copied what you put down. Then I added the header line. What does "But all .csv files does look like output (t)." mean? If they all do look like that, then that's good, isn't it? If you want to actually attach your file with the paperclip icon, go ahead.
Damith
2014-10-30
All my csv files look like the attached test.csv file. But your test2.csv file was different. I want to read all the columns as outputted "t" variable in your above code. All my csv files are structurally same.
But, I have 496 csv files and I need to read all csv files and store them in different columns in a cell array or table.
Please see the attached test.csv file.
Thanks for your help so far.
Image Analyst
2014-10-30
My code was dependent on you being able to add a line with column headers. I'm assuming that you created the CSV files in MATLAB or some other program that you have control over. Can you do that? If not try using textscan like Star suggested. It's a little more complicated because you'll have to parse the string and figure out the format specifier string, but it can be done.
Damith
2014-10-30
Thanks again Image Analyst. All csv files I received from an agency and they have created the csv files. All I need to read the files, which I have a code to read multiple csv files but it does not read as in the structure shown in test.csv. But the code you provided earler works fine the way I want but your test2.csv file is different from my .csv files. I dont know how you converted from test.csv to test2.csv.
Image Analyst
2014-10-30
I just did it manually in the text editor. If you wanted to ad a preprocessing step where you open all the files and add that headerline, you could easily do that with the code in the FAQ: http://matlab.wikia.com/wiki/FAQ#How_can_I_process_a_sequence_of_files.3F
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Spreadsheets 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)