Import CSV data into Matlab

I am trying to import data CSV using MATLAB, with patient_1(1).Data, as a following:

to import I used this code:

filedir = '/Users/';
files1 = dir(fullfile(filedir, '*.csv'));
numFiles1 = length(files1);
for i = 1 : numFiles1
  patient_1(i).Name = files1(i).name;
  patient_1(i).Data = readtable(fullfile(filedir,files1(i).name),'ReadVariableNames',false);   % loads data
end

But I got this warning and following error:

Warning: Unable to determine the format of the DURATION data.
Try adding a format to the DURATION specifier. e.g. '%{hh:mm:ss}T'. 
> In table/readTextFile>textscanReadData (line 554)
In table/readTextFile (line 225)
In table.readFromFile (line 39)
In readtable (line 197)
In patient1 (line 23)
Error using readtable (line 197)
Unable to determine the format of the DURATION data.
Try adding a format to the DURATION specifier. e.g. '%{hh:mm:ss}T'.
Note: readtable detected the following parameters:
'Delimiter', ';', 'HeaderLines', 0, 'Format', '%q%q%T%q%q%T%T%f%f%q%q'
Error in patient1 (line 23)
  patient_1(i).Data =    readtable(fullfile(filedir,files1(i).name),'ReadVariableNames',false);   %    loads data

Can anyone help me to solve this problem?

4 个评论

Could you attach some sample data?
Also, which MATLAB version are you using? There have been recent changes in the handling of duration data.
Could you confirm that you need those first two lines read in and stored as part of the table? You tell it not to read the variable names, and you also do not use HeaderLines, so the implication is that you need those two lines stored.
how many files you need? is it alright i attach 1 file consist of 3-4 rows files?
I am using MATLAB_R2018a.
Yes, I need those first 2 lines and stored them as part of the table. Because I think I will need them to visualize it.
Actually I have tried this line:
patient_1(i).Data = readtable(fullfile(filedir,files1(i).name),'ReadVariableNames',true);
But the result is, the header just catch the first line, not the first two lines. And the content all of them are char/string. I want the content should be variable and suit as they should be.
patient_1(i).Data = readtable(fullfile(filedir,files1(i).name), 'ReadVariableNames', true, 'HeaderLines', 1);
A small number of lines for the test file would be fine. I extracted some of the data from the image you posted but I am not encountering the message you are getting.
This is the sample of file:
I have tried the code. And I got several lines with the same the warning such as below:
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved in the
VariableDescriptions property.
As you can see on the screenshot above, column number 3,6,7,8,9,11, maybe already show appropriate type. But the remain column such as col 10 I think it still string/char. Do you have idea how to solve this?
And how to check/display each column already have appropriate type? (i.e.: col 1 = day, col 2 = date, etc)
And then, as you can see on the screenshot, the first row disappears. It show from 2nd row as header.

请先登录,再进行评论。

 采纳的回答

opt = detectImportOptions('EE_moderate.csv');
opt = setvartype(opt, [2 5], 'datetime');
opt = setvaropts(opt, [2 5], 'InputFormat', 'dd-MMM yyyy', 'DatetimeFormat', 'eee uuuu-MM-dd HH:mm:ss' );
...
T = readtable(fullfile(filedir,files1(i).name), opt);
T.date = T.date + T.time;
T.date_1 = T.date_1 + T.time_1;
patient_1(i).Data = T(:,[2 5 7:end]);
...

19 个评论

Thank you for the solution, but the thing is I have at least 36 that each file has different headers. I there any solution to solve : how to input several CSV files which has different headers?
for example file 1, it hase header like this: first file:
second files:
Transition;Count (-this is header)
Left-Prone;1 (-this is content)
(*I am sorry I cannot attach more picture, because I already tried upload more than max pict in here)
So as you can see each file has no uniform header.
filename = 'file1.csv';
opt = detectImportOptions(filename);
opt = setvartype(opt, [2 5], 'datetime');
opt = setvartype(opt, 10, 'double');
opt = setvaropts(opt, [2 5], 'InputFormat', 'dd-MMM yyyy', 'DatetimeFormat', 'eee uuuu-MM-dd HH:mm:ss' );
opt = setvaropts(opt,10,'Suffixes','%')
opt.VariableNames = {'day1', 'date1', 'time1', 'day2', 'date2', 'time2', 'duration', 'TEE', 'AEE', 'precent_of_AEE', 'mean_METs'};
T = readtable(filename, opt);
T{:,2} = T{:,2} + T{:,3};
T{:,5} = T{:,5} + T{:,6};
output = T(:,[2 5 7:end]);
Sorry, my latest reply did not take into account your comments about having different fields.
Which fields are known to be consistent?
It is not possible to keep the variable names as-is in a table if they contain something that is not a valid MATLAB variable name; the closest is that the content can be put into the VariableDescription property.
I see that some of your fields can have % : are there other special characters that might occur?
Can we count on the time of day fields being immediately after the date fields they are for?
Do you need to retain the day of the week field? Will it always be immediately before the related date portion?
The code I built puts together the date and time fields into a single datetime and formats that to include the weekday (which it can compute itself), and then I drop out the weekday and time-only fields (as that information is now in a single datetime field). Is that acceptable, or do you need the fields separately?
Hi, sorry for late respond. Thank you for the solution. But now, I think I have the different problem. I have put the data into struct and inside the struct I have table. Accoding to your code, there is function "detectImportOptions" that import file csv, but now I just import struct. like in the other question I have post.
detectImportOptions is used to scan a text file or xls or xlsx file to figure out where the key information is in the file. If you already have the information in a struct then detectImportOptions is not relevant, and you can use
T{:,2} = T{:,2} + T{:,3};
T{:,5} = T{:,5} + T{:,6};
T{:,2}.Format = 'eee uuuu-MM-dd HH:mm:ss' ;
T{:,5}.Format = 'eee uuuu-MM-dd HH:mm:ss' ;
output = T(:,[2 5 7:end]);
where T is your table.
Hi, I code like this:
n = length(patient_1_e.patient_1);
figure;
for i = 1:n
if (contains(patient_1_e.patient_1(i).Name, 'SM_dayOverview_day') == 1)
% put the field data inside the new variable
process(i).Name = patient_1_e.patient_1(i).Name;
process(i).Data = patient_1_e.patient_1(i).Data;
if isempty(process(i).Name) ~= 1
disp("nih")
% process(i).Data{:,:} = datetime(process(i).Data{:,:}, 'InputFormat' ,'ddd-MMM yyyy', 'DatetimeFormat', 'eee uuuu-MM-dd HH:mm:ss');
% plot (process(i).Data{:,1}, process(i).Data{:,11});
process(i).Data{:,2} = process(i).Data{:,2} + process(i).Data{:,3};
process(i).Data{:,6} = process(i).Data{:,6} + process(i).Data{:,7};
process(i).Data{:,2}.Format = 'eee uuuu-MM-dd HH:mm:ss' ;
process(i).Data{:,6}.Format = 'eee uuuu-MM-dd HH:mm:ss' ;
output = process(i).Data(:,[2 6 7:end]);
end
end
end
with this struct table:
But I got this error:
Error using Main (line 44)
Addition is not defined between cell and duration arrays.
Somehow when you patient_1_e.patient_1(i), you did not convert columns 2 and 5 from cell array of character into datetime, but you did convert columns 3 and 6 from cell array of character into duration. It would be easier if you had already converted columns 2 and 5 into datetime at the time you read them in.
I code this:
process(i).Data{:,2} = datetime(process(i).Data{:,2}, 'InputFormat', 'dd-MMM yyyy');
But the error following when run:
Error using Main (line 44)
The following error occurred converting from datetime to cell:
Conversion to cell from datetime is not possible.
Yes, I expected this. table() objects do not like to change data type.
How are you getting the information into patient_1_e.patient_1(i).Data ? It would be easier if you change how you get that data into the table.
the way i get information inside patient_1_e.patient_1(i).Data, just by :
process(i).Data = patient_1_e.patient_1(i).Data;
and I put on another struct:
process(i).Data{:,2}
and using "{}" the for get data inside it
No, that shows you taking data that is already in patient_1_e.patient_1(i).Data and storing it in process(i).Data . We need to see how you stored into patient_1_e.patient_1(i).Data . You might have used readtable(), or you might have used xlsread()
I tried this code in command window :
datetime(process(i).Data{:,2}, 'InputFormat', 'dd-MMM yyyy')
and it show the result :
ans =
datetime
11-May-2014
But it getting error like before, once i code and put assertion like this:
process(i).Data{:,2} = datetime(process(i).Data{:,2}, 'InputFormat', 'dd-MMM yyyy')
I think It getting error once I put "process(i).Data{:,2}" as assertion.
According the way I stored into patient_1_e.patient_1(i).Data by using this code:
filedir = '/Users/a';
files1 = dir(fullfile(filedir, '*.csv'));
numFiles1 = length(files1);
for i = 1 : numFiles1
patient_1(i).Name = files1(i).name;
patient_1(i).Data = readtable(fullfile(filedir,files1(i).name), 'ReadVariableNames', true, 'HeaderLines', 1);
end
%%save data 1 as mat file
save('patient1.mat','patient_1'); % this will save complete data for 1 person
I code like this,
because I need to import all of data inside the folder, which is has different header. So, some of files has different pattern of header. THat is why I did not distingush the format like your code:
filename = 'file1.csv';
opt = detectImportOptions(filename);
opt = setvartype(opt, [2 5], 'datetime');
opt = setvartype(opt, 10, 'double');
opt = setvaropts(opt, [2 5], 'InputFormat', 'dd-MMM yyyy', 'DatetimeFormat', 'eee uuuu-MM-dd HH:mm:ss' );
opt = setvaropts(opt,10,'Suffixes','%')
opt.VariableNames = {'day1', 'date1', 'time1', 'day2', 'date2', 'time2', 'duration', 'TEE', 'AEE', 'precent_of_AEE', 'mean_METs'};
T = readtable(filename, opt);
T{:,2} = T{:,2} + T{:,3};
T{:,5} = T{:,5} + T{:,6};
output = T(:,[2 5 7:end]);
% load data patient 1 that has been saved
patient_1_e = load('patient1.mat');
Okay, so now we get back to the question of which fields we can count on being the same. Can we count on fields 1 through 7 being the same in every file?
Okay, so now we get back to the question of which fields we can count on being the same. >> Yes I think so, I am so sorry.
Can we count on fields 1 through 7 being the same in every file? >> I have seen inside 1 folder, I have 36 files. 7 files of them has same patter of header like this:
begin; ; ; end; ; ; ; ...
day; date; time; day; date; time; duration; ...
the 28 files has patter of header like this:
begin; ; ; ; end; ; ; ; ; ...
day; date; time; sample; day; date; time; sample; duration; ...
which the "sample" field is number (e.g.: 19509, etc).
And 1 file consists of data with 12 x 2 as a dimension.
With these files in different formats: what should be extracted from the files?
I have no idea how you might want to process the 12 x 2 file.
With these files in different formats: what should be extracted from the files? >> I think i will process the remaining field after the fields 1 - 7. For example, as i told you before there are 7 files consist of header
begin; ; ; end; ; ; ; ...
day; date; time; day; date; time; duration; ...
some of the file consist of double, percent, and float, in the other files there is the same header like that (field 1-7), but the remaining is string. In the end, I need to exatract the day, date, time, duration (as the x axis) and the remaining fields as y axis.
Hi, I am trying the code
opt = setvartype(opt, 10, 'double');
but i got this error:
Error using matlab.io.ImportOptions/setvartype (line 293)
Index exceeds array bounds.
Error in patient1_coba (line 23)
opt = setvartype(opt, 10, 'double');
Not all of your files have a field #10.
In the original file, field #10 contained percentages complete with the % character. detectImportOptions sees the % and figures that the entire field must be a character vector rather than a number. Doing the setvartype() is to force it to treat as a number, with the % character taken care of by way of the Suffixes option.
I think I asked you earlier what other special characters might occur in fields that you want treated as numeric.
I see thank you for the idea. Yes, I think that is the problem. The each some of the data has same field, but some of them does not have uniform field. When I saw the data, there are 6 files. that has this similar header. Among these 6 data, there is 1 file which only has 8 field/header.
day; date; time; day1; date1; time1; duration; minutes;
the minutes is double.
Is it possible to put "opt" as an array? so that I can save all input data inside opt. Because when I run the code which process 6 files, the "output" will show the latest file.

请先登录,再进行评论。

更多回答(0 个)

类别

帮助中心File Exchange 中查找有关 Data Type Conversion 的更多信息

产品

版本

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by