How can I import only the numbers from an csv.-files with a text header?

11 次查看(过去 30 天)
I have hundreds of .csv-files, I attached one of them for example (Had to shorten it, beacuse it was bigger than 5 MB). Each of them has 10^6 Lines with data.
And I want to import those files automatically in my Matlab code. It is totally enough to import them one by one, but unfortunately I always had to preprocess this data manually with Text Editor. The problem is the text in the header of every .csv-file. I just want to import the numbers of the second, third and fourth column and not the text from the header. But even if I specify the columns, I cannot convert the recieved data store in numbers to run the calculations. This is my solution with the preprocessed data:
pre_data = datastore('Data.csv');
piece = zeros(1,3);
while hasdata(pre_data)
pie = read(pre_data);
pie = pie(:,1:3);
pie = table2array(pie);
piece = [piece; pie];
end
piece = piece(9:10^6+8,:);
With "piece", I can now easily run the calculations
To import the data without preprocessing, I tried "ds.SelectedVariableNames" and replacing "datastore" with "csvread". But nothing works.
Have anyone an advice, how to import such csv-files as an easily processable 1000000x3-double?
  1 个评论
dpb
dpb 2018-12-15
编辑:dpb 2018-12-15
Just attach the text of the first few (10 is enough) lines of the file that shows the header and data structure; how many data lines are in the file after the header is totally immaterial to the solution (as long as you have enough memory to hold the data).
The key Q? is whether the file structure is the same regarding the header -- is it always the same number of lines, are there a consistent number of blank records (if any) after the header, etc., etc., etc., ...
Also, are there the same number of variables (columns) in the file and are the records properly delimited if there are missing data?

请先登录,再进行评论。

采纳的回答

Jeremy Hughes
Jeremy Hughes 2018-12-16
You should be able to add 'NumHeaderLines',7 to the datastore call and get what you want.
The issue is that this looks a lot like a CSV file exported from Excel. There are a lot of extraneous commas, and that's throwing off all the file format detection.
  1 个评论
Christoph Müßig
Christoph Müßig 2018-12-16
Thank you all for your ideas and tricks. The solution to add 'NumHeaderLines',7 to the datastore call worked perfectly and solved the problem.

请先登录,再进行评论。

更多回答(0 个)

产品


版本

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by