Problem to read csv file with a blank line

35 次查看(过去 30 天)
Hi folks,
I'm struggling to use date and time data of a web file, which can be saved in csv format.
I want to import the csv file starting at the six line (I dont want the information before the six line).
Every time I try to import with readtable, Matlab import this file with data and blank lines. I don't want blank lines in the midle of the data.
I just want to be able to read the date , time and the "MW" values.
Thank you very much for your help
  3 个评论
Fabio Retorta
Fabio Retorta 2020-9-14
websave('PV_50hz.csv', 'https://ds.50hertz.com/api/PhotovoltaicActual/DownloadFile?fileName=2020.csv');
[PV_data] = readtable('PV_50hz.csv');
Adam Danz
Adam Danz 2020-9-14
编辑:Adam Danz 2020-9-14
Your file isn't in standard format so you'll need to use delimitedTextImportOptions to specify lots of details such as
  • delimiter type (;)
  • number of variables
  • variable names
  • Variable types
  • datetime formats
I've been using delimitedTextImportOptions for a while and it's still something I have to play around with when importing non-standard files. It's not easy. The Import tool can help you out but even then you have to specify many of the details above.
Check out my answer for a solution but it requires you to remove the first character of the file which is a semicolon that causes problems.

请先登录,再进行评论。

采纳的回答

Adam Danz
Adam Danz 2020-9-14
编辑:Adam Danz 2020-9-14
The first character in your file is a semicolon (;) which is also the delimiter and that's making it difficult to import your data.
Option 1: export the data properly
If you can re-export the data properly and avoid the leading semicolon, do that and this method will work. It's better than the second method.
Check the datetime formats I've specified to make sure they match your data's formats.
file = '2020.csv'; % Use the full path whenever possible!
opts = delimitedTextImportOptions("NumVariables", 4);
% Specify range and delimiter
opts.DataLines = [6, Inf];
opts.Delimiter = ";";
% Specify column names and types
opts.VariableNames = ["Datum", "von", "bis", "MW"];
opts.VariableTypes = ["datetime", "datetime", "datetime", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, "Datum", "InputFormat", "dd.MM.yyyy");
opts = setvaropts(opts, "von", "InputFormat", "HH:mm");
opts = setvaropts(opts, "bis", "InputFormat", "HH:mm");
% Import the data
T = readtable(file, opts);
Look at the first few rows of the data
head(T)
% ans =
% 8×4 table
% Datum von bis MW
% __________ _____ _____ __
% 01.01.2020 00:00 00:15 0
% 01.01.2020 00:15 00:30 0
% 01.01.2020 00:30 00:45 0
% 01.01.2020 00:45 01:00 0
% 01.01.2020 01:00 01:15 0
% 01.01.2020 01:15 01:30 0
% 01.01.2020 01:30 01:45 0
% 01.01.2020 01:45 02:00 0
Option 2: Fix the file in Matlab before importing
If you can't re-export the data properly, fix the files first in Matlab using the method below and then read them in. The first section below read in the file as text, removes null characters, char(0), and then re-writes the data with a new file name so you don't lose the initial data file. The second section then imports the clean file.
See inline comments for important details.
% SECTION 1: Clean up and rewrite the file
file = '2020.csv'; % Use the full path whenever possible!
% Read in the entire file as text
txt = fileread(file);
% Remove all null characters
txtClean = regexprep(txt,char(0),'');
% Re-write the file with a new name
% * If "file" contains the full path, you'll need to change
% this line below to rename the file.
fid = fopen(['clean_',file],'w');
fprintf(fid, '%s',txtClean);
fclose(fid);
Now you have a new file named "clean_2020.csv" and it can be read in without problems. Check the datetime values to make sure they match your formats.
newfile = ['clean_',file]; % Use the full path whenever possible!
opts = delimitedTextImportOptions("NumVariables", 4);
% Specify range and delimiter
opts.DataLines = [6, Inf];
opts.Delimiter = ";";
% Specify column names and types
opts.VariableNames = ["Datum", "von", "bis", "MW"];
opts.VariableTypes = ["datetime", "datetime", "datetime", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Specify variable properties
opts = setvaropts(opts, "Datum", "InputFormat", "dd.MM.yyyy");
opts = setvaropts(opts, "von", "InputFormat", "HH:mm");
opts = setvaropts(opts, "bis", "InputFormat", "HH:mm");
% Import the data
T = readtable(newfile, opts);
Look at the first few rows of T
>> head(T)
% ans =
% 8×4 table
% Datum von bis MW
% __________ _____ _____ __
% 01.01.2020 00:00 00:15 0
% 01.01.2020 00:15 00:30 0
% 01.01.2020 00:30 00:45 0
% 01.01.2020 00:45 01:00 0
% 01.01.2020 01:00 01:15 0
% 01.01.2020 01:15 01:30 0
% 01.01.2020 01:30 01:45 0
% 01.01.2020 01:45 02:00 0
  4 个评论
Fabio Retorta
Fabio Retorta 2020-9-14
Nice my friend. Now it is working properly. Thank you very much for your time. I really appreciate your solution. Best Regards :)

请先登录,再进行评论。

更多回答(2 个)

Jon
Jon 2020-9-14
编辑:Jon 2020-9-14
You should be able to use the headerlines property value pair in your call to readtable to skip the first 5 lines something like:
A = readtable('2020.csv','HeaderLines',5)
  2 个评论
Fabio Retorta
Fabio Retorta 2020-9-14
Hi Jon,
It doesn't work. If you try you will see that Matlab add blank lines in the data
Jon
Jon 2020-9-14
It seemed to work ok for me. The only glitch I noticed was that it gave me 5 columns of data rather than 4, where the last column was empty. I could easily delete that column though if it were a problem

请先登录,再进行评论。


Jeremy Hughes
Jeremy Hughes 2020-9-14
Hi,
As others have pointed out, your CSV (Comma separated value) file is actually semicolon separated. If readtable is using comma, you'll have to pass the delimiter. I am a little surprised if that isn't automatically detected, but any detection heuristic will get things wrong every now and then. (I didn't check myself)
T = readtable(filename,"Delimiter",';',"NumHeaderLines",5)
"The ideia is not to correct manualy"
If the detection of readtble doesn't give you the desired results by default, there's not anything you can do other than correct it manually (or get the wrong data).
There was a time when readtable didn't do detection--the default delimiter would be "," and NumHeaderLines = 0. Which you'd have to correct manually. (Which was the case a lot more frequently than it is now.)
I hope this helps.
J
BTW Adam Danz's soultion is better if you know this is the exact format you want to read. No detection at all.
  6 个评论
Walter Roberson
Walter Roberson 2020-9-14
Yes, it is UTF16-LE . Current versions (R2020a) of readtable() and textscan() will deal with that automatically.
Adam Danz
Adam Danz 2020-9-14
编辑:Adam Danz 2020-9-14
Ahhh... I started using r2019b recently while developing an app for a customer who requires 19b.
In that case, OP may be able to export the data properly using the CharacterEncoding property of weboptions().

请先登录,再进行评论。

标签

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by