XLS read taking lot of time

12 次查看(过去 30 天)
Rajiv Kumar
Rajiv Kumar 2021-7-26
编辑: dpb 2021-7-28
Hello,
I am working on MATLAB wherein I read a large XLS file using xlsread1 . Our Xls file has multiple worksheet as well.
This takes aroung 1min 28sec which is too large for us.
Is there any better way to do the same?
Thanks & Regards
Bhavna Yadav
  2 个评论
Rik
Rik 2021-7-26
I suspect you have two options to speed this up:
  1. Get a faster computer (faster disk, processor, and RAM).
  2. Avoid xls, using another data format instead (xlsx or mat).
You could also see if xlsread is faster than xlsread1. The documentation claims readtable, readmatrix, and readcell might have a better performance.
dpb
dpb 2021-7-26
编辑:dpb 2021-7-28
If forced to use Excel -- and probably are, at least once to retrieve the data from Excel where it already is and to do something less painful with it -- the .xls format is faster than .xlsx for large files and those with complex formulas. xlsx is a XLM-based and then zipped so have to unzip and interpret; .xls is a binary format.
The current incarnation of a binary file format is .xlsb which is smaller and should be faster/smaller than the XML document default .xlsx interchange format.
Other than the option of faster hardware, try the 'UseExcel','true' option and see if it helps any or not...I've not really tried timing experiments on input side.
However, readmatrix/readcell pass the heavy lifting off to readtable with just the instruction of how to return the results. That probably helps a little in the up front work.
But, the base TMW-supplied functions still are (and have to be) standalone on the assumption each call is independent and the next use can be on a totally different file so they can't take advantage of any savings in not doing all the background work of opening and closing the connection, etc., etc, etc., ... every call. Hence, they inherently have that overhead when being used repetitively.
It does seem as though TMW should add some other functions for such purposes given the ubiquitous nature of Excel -- one would think they could use the internal io libraries to advantage given the user request to set up for multiple accesses in the same file even more than the present FEX workarounds.
The use of the "trick" used inside xlswrite1 to hold open the one connection across multiple invocations when writing to the same file multiple times aids immensely; obviously that should also help in the same way on the input side -- not creating the new instance of the COM server and opening/closing a file multiple times is where much of the time is, besides being the place where things break when gets to be too fast for ActiveX server to keep up with the changes in file status.
However, I've used <xlswriteex> rather than xlswrite1 which is an enhancement by @Nick Oatley that opens the ActiveX server once externally first and keeps a persistent variable to that handle which removes some of the difficulties folks have had with the specific implementation that relies on a fixed handle naming convention.
I would presume a similar modification on the writing side would also help in the coding/use although I've not tried it.
Beyond that, ActiveX is just inherently a slow process compared to direct disk i/o of compact data files stored in more efficient formats -- it just doesn't scale well for huge datasets.
ADDENDUM
"<xlswriteex> rather than xlswrite1 which is an enhancement by @Nick Oatley that opens the ActiveX server once externally first and keeps a persistent variable to that handle.."
I think there may even be code inside xlswriteex (and maybe also in xlswrite1 if it also is basically the xlswrite m-file code) that will keep the ActiveX connection but switch files if the subsequent call is to a different file than the preceding. ISTR seeing such code; I've not tried to use it in that fashion so not sure, but for such an application it's worth looking at the details internally...
ADDENDUM SECOND:
"I think there may even be code inside xlswriteex (and maybe also in xlswrite1 if it also is basically the xlswrite m-file code) that will keep the ActiveX connection but switch files..."
That is, in fact part of the enhancement compliments of @Nick Oatley -- he made both the ActiveX Excel connection object variable and the filename variable persistent -- so, the routine can be reentered without having to recreate those, in particular it doesn't close the connection nor the file routinely; if the filename on a subsequent call changes from that of the previous use, then that file is closed and the new one opened.
A final "cleanup" call with no arguments at all is the trigger that this sequence of use is over and that closes the connection and the last file.
I don't see anything preventing something similar on the read side although I've not yet looked at the actual code.

请先登录,再进行评论。

回答(0 个)

标签

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by