How can I get MATLAB to preserve column names as variables when importing a csv file to timetable?

169 次查看(过去 30 天)
Hello,
I can't seem to get MATLAB to import a csv file and preserve the column names as variables. The variable names come back as Var1, Var2, etc. Here's one of the approaches I've tried:
opts = detectImportOptions('test.csv',"VariableNamingRule","preserve");
test= readtimetable('test.csv', opts);
I've also left off the options qualifiers, but to no avail. The variable names still come back as Var1, Var2, etc.
Interestingly, the variable names are preserved if I use the Import wizard, but this is not a long-term solution. I have the same problem if readtable is used instead of readtimetable.
Any assistance greatly appreciated! Thanks for reading-
Nina
  12 个评论
Nina Sweeney
Nina Sweeney 2023-10-25
Walter, Star Strider:
Yes, I have no trouble preserving the variable names for the snippet; the problem is that the variable names change to Var1,...,Var14 for the full csv file.
Nina
Nina Sweeney
Nina Sweeney 2023-10-25
I haven't tried datastore yet, but I plan to do so.
Here's the solution, at least for now:
  1. Import the large csv with the Import Tool.
  2. Once in the Import Tool: Import Selection > Generate Script (or Generate Function).
  3. Save and run the script. The script produces the needed options to use the column headers of the csv file as variable names.
  4. The generated code is shown below:
IMPORTFILE Import data from a text file
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILENAME) reads data from text file
% FILENAME for the default selection. Returns the data as a table.
%
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILE, DATALINES) reads data for the
% specified row interval(s) of text file FILENAME. Specify DATALINES as
% a positive scalar integer or a N-by-2 array of positive scalar
% integers for dis-contiguous row intervals.
%
% Example:
% node202flightdataout = importfile("C:\Nina\Aircraft Position\All Logs Update 1\node202_flight_data_out.csv", [2, Inf]);
%
% See also READTABLE.
%
% Auto-generated by MATLAB on 25-Oct-2023 08:07:30
%% Input handling
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [2, Inf];
end
%% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 15);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.VariableNames = ["LogSet", "UTCTimeS", "Time", "rxEbno1For257", "rxEbno2For257", "RxLinkRatebps", "TxPowerdBm", "TxLinkRatebps", "Lat", "lon", "Altm", "Headingdeg", "Rolldeg", "Pitchdeg", "VarName15"];
opts.VariableTypes = ["double", "double", "datetime", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "string"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, "VarName15", "WhitespaceRule", "preserve");
opts = setvaropts(opts, "VarName15", "EmptyFieldRule", "auto");
opts = setvaropts(opts, "Time", "InputFormat", "yyyy-MM-dd'T'HH:mm:ss");
% Import the data
node202flightdataout = readtable(filename, opts);
end

请先登录,再进行评论。

采纳的回答

Nina Sweeney
Nina Sweeney 2023-10-26
I haven't tried datastore yet, but I plan to do so.
Here's the solution, at least for now:
  1. Import the large csv with the Import Tool.
  2. Once in the Import Tool: Import Selection > Generate Script (or Generate Function).
  3. Save and run the script. The script produces the needed options to use the column headers of the csv file as variable names.
  4. The generated code is shown below:
IMPORTFILE Import data from a text file
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILENAME) reads data from text file
% FILENAME for the default selection. Returns the data as a table.
%
% NODE202FLIGHTDATAOUT = IMPORTFILE(FILE, DATALINES) reads data for the
% specified row interval(s) of text file FILENAME. Specify DATALINES as
% a positive scalar integer or a N-by-2 array of positive scalar
% integers for dis-contiguous row intervals.
%
% Example:
% node202flightdataout = importfile("C:\Nina\Aircraft Position\All Logs Update 1\node202_flight_data_out.csv", [2, Inf]);
%
% See also READTABLE.
%
% Auto-generated by MATLAB on 25-Oct-2023 08:07:30
%% Input handling
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [2, Inf];
end
%% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 15);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.VariableNames = ["LogSet", "UTCTimeS", "Time", "rxEbno1For257", "rxEbno2For257", "RxLinkRatebps", "TxPowerdBm", "TxLinkRatebps", "Lat", "lon", "Altm", "Headingdeg", "Rolldeg", "Pitchdeg", "VarName15"];
opts.VariableTypes = ["double", "double", "datetime", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "double", "string"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, "VarName15", "WhitespaceRule", "preserve");
opts = setvaropts(opts, "VarName15", "EmptyFieldRule", "auto");
opts = setvaropts(opts, "Time", "InputFormat", "yyyy-MM-dd'T'HH:mm:ss");
% Import the data
node202flightdataout = readtable(filename, opts);
end

更多回答(2 个)

Star Strider
Star Strider 2023-10-24
编辑:Star Strider 2023-10-25
I am not certain that 'VariableNamingRule' works as part of the options structure.
Try this instead:
test= readtimetable('test.csv', ,"VariableNamingRule","preserve");
EDIT — (25 Oct 2023 at 01:28)
It seems to work correctly here —
opts = detectImportOptions('test.csv',"VariableNamingRule","preserve");
test = readtimetable('test.csv', opts)
test = 3×13 timetable
Time LogSet UTCTimeS rxEbno1For257 rxEbno2For257 RxLinkRate(bps) TxPower(dBm) TxLinkRate(bps) Lat lon Alt(m) Heading(deg) Roll(deg) Pitch(deg) ____________________ ______ __________ _____________ _____________ _______________ ____________ _______________ ______ _______ ______ ____________ _________ __________ 11-Aug-2023 21:02:29 0 1.6918e+09 1.3 9.4 76250 2.2 20250 38.292 -76.419 29 125.15 -0.86387 1.8717 11-Aug-2023 21:02:30 0 1.6918e+09 -1.7 9 76250 3.5 20250 38.292 -76.419 29 124.9 -0.87145 1.9309 11-Aug-2023 21:02:31 0 1.6918e+09 -6.8 8.6 76250 2.9 20250 38.292 -76.419 29 124.24 -0.89849 1.9858
EDIT — (25 Oct 2023 at 11:45)
@Nina Sweeney — See if the same problem occurs with datastore. It also has a VariableNamingRule option. It is designed for large arrays, so it may work with your file.
.

Walter Roberson
Walter Roberson 2023-10-24
Works for me, provided that the number of header variables does not exceed the number of detected columns.
When I accidentally added extra numeric columns in the data, I did get trailing Var* variables being generated.
All of this requires that detectImportOptions was able to figure out which row had your variable names. If that goes wrong, then Sure, it might end up generating Var* variables.
filename = 'test.csv';
opts = detectImportOptions(filename,"VariableNamingRule","preserve");
%for this PARTICULAR test file, the datetime format is not recognized
opts = setvartype(opts,1,'datetime');
opts = setvaropts(opts,1,'InputFormat', 'MMM d uuuu HH:mm');
test = readtable('test.csv',opts)
test = 2×7 table
first second third one fourth %increase month to date last ____________________ ______ _________ ______ _________ _____________ ____ 03-Jan-2001 18:03:00 1 2 3 4 5 6 17-Aug-2022 05:19:00 9 8 7 6 5 4

类别

Help CenterFile Exchange 中查找有关 Data Preprocessing 的更多信息

产品


版本

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by