Insufficient storage (activex) - unable to complete operation - pulling in excel data more quickly.

Question

Jamie 2016-8-5

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/298532-insufficient-storage-activex-unable-to-complete-operation-pulling-in-excel-data-more-quickly

评论： Jamie 2016-8-17

Hi All,

I have some reasonably large excel files to process (200-800mb). After a bit of dabbling and on review of the numerous threads here on the forum I have elected to use activex controls etc.

For speed of execution I am using xlsread (as opposed to csvread / importdata). Textscan is less than ideal due the variation between sheets (i.e. requires a format specifier).

To make things easier I have been reading from the excel sheets in bite sized chunks to 1) help speed up the read and 2) to avoid (other) memory errors.

Presently I am facing a challenges with the read call, that is; eRange = get(e.Activesheet,'Range',sheetR).

Currently the script loops through a 'block' of 20 columns or so reading in all rows. On the first call

that is:

eRange = get(e.Activesheet,'Range',sheetR).
raw = eRange.Value;

the program executes as expected. However looping through on the succeeding call and loading the next block I receive the error

"Error: Not enough storage is available to complete this operation."

Below is a brief outline of the script and corresponding output.

-------------- Script ------------------

% Uses activeX controls. 
% pathAndFilename = 'C:\Users\owner\Desktop\Convert GS logs\temp.xlsx'
%%Open excel application
disp('Opening Excel application...')
e=actxserver('excel.application');
wb=e.Workbooks;
w=Open(wb, pathAndFilename');
%%Find range / break range of sheet into managable chunks
% Define excel column headers (NOT r1c1 reference style..)
Alphabet=('A':'Z').'; 
[I,J] = meshgrid(1:26,1:26);
XX=[Alphabet(I(:)), Alphabet(J(:))];
XX=cellstr(strvcat(Alphabet,XX));
clear I J Alphabet
% Range
disp('Extracting range from excel sheet...')
RangeInString=e.ActiveSheet.UsedRange.Address;
range = strsplit(RangeInString,'$');
nrow = str2double(range(end));
ncol = find(strcmp(range(end-1),XX));
% Readuce the scale of the succeeding reads (Time / memory overheads). 
if ncol > 20
    splt = ceil(ncol/lump); 
    bite = floor(ncol / splt);
    columns = [1:bite:ncol];
    surp_col = ncol - columns(end);
    columns(end) = columns(end) + surp_col;
end
clear splt surp_col bite range
%%Do some work
% loop through the sheet and extract the respective chunks
for i=1:length(columns)-1
    disp(['processing columns ',num2str(columns(i)), ' through ' num2str(columns(i+1)),' of ',num2str(ncol), ' columns...'])
    tic
    % set the range for the current read / chunk
    sheetR = [XX{columns(i),:},'1:',XX{columns(i+1),:},num2str(nrow)];
    %
    % eRange = get(e.Activesheet,'Range','A1:B2');
    eRange = get(e.Activesheet,'Range',sheetR);
    raw = eRange.Value;
    clear raw sheetR
end

What I wish to achieve is to read the full xlsx file and place the data in a .mat. To reduce the overhead, on each loop the data 'raw' is written to a .mat using the -7.3 format. All non essential variable are removed using a 'clearvars -except' call.

The scrip runs fine of the first 'chunk', however I receive an error on the following iteration. What is odd to me is that each 'chunk' pulled in on the read call is precisely the same size (number of rows / columns).

The output of the script is as follows:

Loading log file: 20160719_0945.xlsx ...
Opening Excel application...
Extracting range from excel sheet...
processing columns 1 through 20 of 152 columns...
processing columns 20 through 39 of 152 columns...
Error: Not enough storage is available to complete this operation.

The script fails on the 'raw = eRange.Value' call.

A 'whos' on the preceding line (prior to the 'raw = eRange.Value') suggests little memory overhead.

    Name                   Size            Bytes  Class                                             Attributes
    MatObj                 1x1               112  matlab.io.MatFile                                           
    MetaData               1x1              3069  struct                                                      
    RangeInString          1x15               30  char                                                        
    UAVnav_Log_ID          1x1               138  cell                                                        
    XX                   702x1             81380  cell                                                        
    columns                1x8                64  double                                                      
    e                      1x1                    COM.excel_application                                       
    eRange                 1x1                    Interface.00020846_0000_0000_C000_000000000046              
    filename               1x18               36  char                                                        
    i                      1x1                 8  double                                                      
    lump                   1x1                 8  double                                                      
    matf                   1x13               26  char                                                        
    ncol                   1x1                 8  double                                                      
    nrow                   1x1                 8  double                                                      
    pathAndFilename        1x71              142  char                                                        
    w                      1x1                    Interface.000208DA_0000_0000_C000_000000000046              
    wb                     1x1                    Interface.000208DB_0000_0000_C000_000000000046

I am presently running the script on a 64bit machine with 8Gb of RAM. However, I would very much like to deploy this / similar scripts in the future on a smaller machine for those times when we are 'working in the field'. Any pointers would be greatly appreciated. A simple solution is to significantly reduce the number of columns read during any one call - however, surely there is a better way?

The example file I am running has 530,000 rows (530,000 rows x 158 columns) and generates the cell array;

raw 528439x20 1268253582 cell

2 个评论
显示无隐藏无

per isakson 2016-8-6

编辑：per isakson 2016-8-10

"I am using xlsread" &nbsp However, the code doesn't show that you use xlsread. Why is that?
How should I interpret &nbsp "raw 528439x20 1268253582 cell" ?
What does &nbsp whos raw &nbsp show?

Jamie 2016-8-17

Hi Per, apologies for the tardy response. I was expecting an email if there was a post. Quite right! I had replaced the xlsread with the call 'eRange = get(e.Activesheet,'Range',sheetR)' in the hope that it would solve the problem. Both xlsread and activex calls resulted in similar memory errors. 'Raw' was the output values of the 'get' call above.

If memory serves the output of 'whos raw' was the above "raw 528439x20 1268253582 cell" which details the size and data format (cell). I did not find a solution / better approach. I left the above snippet should it prove useful to others..

请先登录，再进行评论。

请先登录，再回答此问题。

Insufficient storage (activex) - unable to complete operation - pulling in excel data more quickly.

2 个评论
显示无隐藏无

回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Insufficient storage (activex) - unable to complete operation - pulling in excel data more quickly.

2 个评论 显示 无隐藏 无

回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

2 个评论
显示无隐藏无