Read in specific range of large .csv

Question

Louise Wilson 2020-11-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/642605-read-in-specific-range-of-large-csv

评论： Raunak Gupta 2020-11-17

I have very large .csv files that I am trying to work with, e.g. 7000 * 72000.

In each file the first column is a time vector. By saving these time vectors in separate files, I can load them in, get the row range of the dates of interest, and then use that to read in the rows of interest from the larger .csv?

However, I can't figure out how to apply this last step. Here is what I have so far...

%get time period of interest
startdate=datetime(2019,08,20); 
enddate=datetime(2019,09,10);
timeperiod=datenum(startdate:enddate); 
timeperiod=timeperiod';
%load in time vector
tvec_folder=('H:\SoundTrap\Boats\PSD Output\PSD_tvec');
tvecfile1=('TVEC_002Tiritiri_5280_PSD_1sHammingWindow_50%Overlap_2min_output.csv');
PSD_tvec=readtable(fullfile(tvec_folder,tvecfile1)); %read tvec and get times
PSD_tvec_t=PSD_tvec.Var1;
%get row range of interest
idx=PSD_tvec_t>timeperiod(1) & PSD_tvec_t<timeperiod(end); %find rows in tvec 
%which correspond to date range of interest
x=find(idx(:,1)>0); %get row numbers for reading in PSD
PSDfolder=('H:\SoundTrap\Boats\PSD Output\Duty cycle data'); %folder where PSD output files are
PSDfile1=('002Tiritiri_5280_PSD_1sHammingWindow_50%Overlap_2min_output.csv');
%PSDfile1=readtable(fullfile(PSDfolder,PSDfile1)); %read in PSD file

How can I select a range of interest as I read the .csv?

In addition to selecting specific rows, I could also cut the data down by selecting different columns. I have tried that this way:

opts.SelectedVariableNames(2:24000)
T=readtable(fullfile(PSDfolder,PSDfile1),opts);

...but for some reason, whilst this does select the desired column range, it doesn't read the full number of rows in the file and there are no error messages.

Alternative ways of solving the problem would be equally appreciated. I need to read in these large files but since it is time consuming and I don't always need all of the data, I am looking to be more efficient. Thanks

4 个评论
显示 2更早的评论隐藏 2更早的评论

Louise Wilson 2020-11-10

在 MATLAB Online 中打开

Hi Mathieu,

No I hadn't tried this. What I want to do is read specific rows, not necessarily a range of rows as there could be rows inside the range that are not of interest, is that possible?

%Input variables
startdate=datetime(2019,08,20);
enddate=datetime(2019,09,10);
tvec_folder=('H:\SoundTrap\Boats\PSD Output\PSD_tvec');
tvecfile1=('TVEC_002Tiritiri_5280_PSD_1sHammingWindow_50%Overlap_2min_output.csv');
PSDfolder=('H:\SoundTrap\Boats\PSD Output\Duty cycle data'); %folder where PSD output files are
PSDfile1=('002Tiritiri_5280_PSD_1sHammingWindow_50%Overlap_2min_output.csv');
timeperiod=datenum(startdate:enddate); %get time period of interest
timeperiod=timeperiod';
PSD_tvec=readtable(fullfile(tvec_folder,tvecfile1)); %read tvec 
PSD_tvec_t=PSD_tvec.Var1; %get times
idx=PSD_tvec_t>timeperiod(1) & PSD_tvec_t<timeperiod(end); %find rows in tvec which correspond to date range of interest
x=find(idx(:,1)>0); %get row numbers for reading in PSD
PSD=csvread(fullfile(PSDfolder,PSDfile1),x); %to specify rows, doesn't work

Louise Wilson 2020-11-10

编辑：Louise Wilson 2020-11-10

在 MATLAB Online 中打开

Hi again Mathieu,

I can get these to work:

%works to select first row number 
PSD=csvread(fullfile(PSDfolder,PSDfile1),4681,1);
%works to select specific frequency range
opts=detectImportOptions(fullfile(PSDfolder,PSDfile1));
opts.SelectedVariableNames=opts.SelectedVariableNames(1:24000); %only read freq range of interest
PSDfile1=readtable(fullfile(PSDfolder,PSDfile1),opts); %read in PSD file

but I would like to be able to specify the start and end of the columns and rows in one command.

I can try this (which would be perfect if it worked):

PSD=readmatrix(fullfile(PSDfolder,PSDfile1),'Range',[4681 1 5760 24000]);

but I get the error:

Error using readmatrix (line 149)

Unable to determine range. Range must be a named range in the sheet or a single cell within

'XFD1048576'.

Louise Wilson 2020-11-10

It doesn't matter what range I try, even if I just try to get the first row it doesn't work.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Raunak Gupta 2020-11-14

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/642605-read-in-specific-range-of-large-csv#answer_544113

Hi,

From the question I understand that you want to import only a chunk of ‘.csv’ file for analyzing. The readmatrix with ‘Range’ should return the mentioned range of values for you. The only crux here is the number of columns that you want to import is greater than what a normal csv file can be displayed with in Excel, which is XFD and that corresponds to 16384 columns. Since you are using columns from 1 to 24000 the specified error is thrown.

So, if you can somehow store the transpose of the original matrix to the csv file that will resolve the current error as ranges for row and columns will reverse and fall into the limits.

If that is not possible you can use tall array to import the whole file but not in memory. And then you can choose the size using logical indexing and then finally get the desired matrix using gather.

Hope this helps!

2 个评论
显示无隐藏无

Louise Wilson 2020-11-16

在 MATLAB Online 中打开

Thank you Raunak. This helps a lot. I am using this successfully at the moment:

tic
PSD1=readmatrix(fullfile(PSDfolder,PSDfile1),'Range',rowrange1); %works
PSD1_t=datetime(PSD1(:,1),'ConvertFrom','datenum');
toc

I can transpose the file after reading it in as above, and this seems to work fine? Is it the case that the issue you mention is purely a display issue, that the data is still there?

Raunak Gupta 2020-11-17

Hi,

This the limit on column number is basically due to display limit in Excel, since .csv files follows the same methodology it is expected that number of columns to not exceed 16384. Number of rows can be a big number since usually it represent the observations so it can go upto 1048576, whereas column represent each feature for a specific observation.

请先登录，再进行评论。

Read in specific range of large .csv

4 个评论
显示 2更早的评论隐藏 2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Read in specific range of large .csv

4 个评论 显示 2更早的评论隐藏 2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

2 个评论
显示无隐藏无