Read a very large .csv file, split into parts and save each part into a smaller .csv file
14 次查看(过去 30 天)
显示 更早的评论
Deat Matlabers,
I need to read a very large .csv file with about 15.000 columns and 500.000 rows. I need to split it into chunks of rows (i.e. 20.000 rows and all 15.000 columns), and save each chunk into a separate .csv file.
- I have tried to use textscan, but I am not sure that this can work, as I have not only numerics, but also non-numerics and dates across separate columns. I would ideally aim to get all this information, as I will need it for different parts of my project.
2. I also attempted tabularTextDatastore, but I get an error:
Unable to determine the format of the DATETIME data.
Try adding a format to the DATETIME specifier. e.g. '%{MM/dd/uuuu}D'.
Is there any way I could provide a DATETIME specifier (this is not explained in the relevant documentation)?
Memory is not a problem here, as I currently use a supercomputer in terms of RAM.
I would be grateful for any ideas.
George
0 个评论
采纳的回答
Jeremy Hughes
2019-9-27
If your plan is to write all the small CSV files out, and do nothing with them in MATLAB, I'd say just use tabularTextDatastore, and set all of the ds.TextscanFormats(:) = {'%q'}, There should never be any errors with '%q'
Then use writetable.
ds = tabularTextDatastore(filename,'ReadSize',myReadSize);
ds.TextscanFormats(:) = {'%q'};
while hasdata(ds)
% Need to figure out the file names but other than that, this should work.
writetable(read(ds),output_filename);
end
3 个评论
Jeremy Hughes
2019-9-30
':' is a MATLAB syntax meaning "all".
x(:) = -1,
would set all the values in x to -1. I meant literally that code. =)
更多回答(1 个)
Sulaymon Eshkabilov
2019-9-26
Hi,
The answer is rather simple. You can take out all dates with string specifier: %s. E.g. file called: DATA_date.txt
DATE Row1 Row2 Row3 Row5
11/11//2019 1 1.13 2 3.33
11/11//2019 2 0.13 3.12 3.33
11/11//2019 3 2.13 -2 -5.33
11/11//2019 4 4.13 -3 -7.33
11/11//2019 5 3.13 5.5 -8.33
11/11//2019 6 2.13 2.6 -13.33
Can be imported into matlab workspace with:
FileName = 'DATA_date.txt';
FID = fopen(FileName, 'r');
SPECs = '%s%d%f%f%f';
N_header = 1;
DATA = textscan(FID, SPECs, 'headerlines', N_header);
fclose(FID);
Now all imported data will be inside a cell array DATA. DATA{1,1} contains DATE values; DATA{1,2} contains data of Row1; ... DATA{1,5} contains data of Row5.
Good luck.
4 个评论
Sulaymon Eshkabilov
2019-9-26
Carefully pay attention how your data is formatted such as data type, viz. integer, floating point, dates, texts, etc. Number of columns in each row has to match with the subsequent row. That means your data need to be very well neatly formatted. If you have one data point missing somewhere in your large data that would create a problem.
Good luck.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Standard File Formats 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!