您正在关注此问题

您将在关注的内容源中看到更新。
您可能会收到电子邮件，具体取决于您的通信预设项。

How to increase reading speed from a Gigabyte large file ?

1 次查看（过去 30 天）

显示更早的评论

farzad 2019-6-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file

⋮

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file

评论： farzad 2019-6-20

Hi all

how do I increase reading speed from an Excel file that contains rows and columns with a volume of some GigaBytes?

18 个评论
显示 16更早的评论隐藏 16更早的评论

dpb 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715498

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715498

Increase relative to what?

farzad 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715501

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715501

Let's ask it this way :

what is the fastest way to parse the data?

dpb 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715516

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715516

Dunno...'pends on what the data are and how saved...getting it out of Excel and into a .mat or stream file would undoutedly be the fastest.

farzad 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715519

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715519

The data are float and let's say 5 Gigabytes.

why .mat and why stream file ? how would the code be like ?

is using the table useful ?

dpb 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715625

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715625

'Cuz both .mat and stream files are binary representations of the actual bytes in memory, thus eliminating the need for conversion.

You've still not said which form of file it actually is; if it is .xls(x), then the xlsread is fairly slow.

A table would be one choice for internal storage in Matlab; how useful depends entirely on what the data are and how they need to be processed which like the actual file itself, you're keeping us totally in the dark so all we can do is guess...

farzad 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715628

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715628

I need to knpw for both csv and xlsx I mentioned before, data type is float

dpb 2019-6-17

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715657

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715657

Well, with .xlsx files you have the choice between xlsread and readtable. You'll just have to test which is faster--one presumes probably readtable. If you have R2019a, you can try the new readmatrix which is now recommended instead of xlsread.

For csv files, the historic ways are csvread, textscan, fscanf altho again with the caveat of requiring R2019a, readmatrix is the TMW-recommended alternative now.

I don't have R2019a installed yet, so I can't comment on the relative performance between it and alternatives.

Still, if speed and doing this more than once will be required, then doing it once and then using .mat or stream files will undoubtedly beat any of the alternatives.

You could, if your application can live with single precision, cut the file size in half by saving single instead of double. That's purely a case of what is required of the data itself as to whether would be a viable alternative or not.

Walter Roberson 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715761

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715761

编辑：Walter Roberson 2019-6-18

I wrote out 1e6 by 50 of doubles = 4 gigabytes in binary form, and tested how long loading took.

When saved as space-delimited double using save -ascii -double, then using load() of the 12501000000 bytes of text file took 1416 seconds.

textscan() of that same file took 265 seconds.

fscanf() of the same file took 371 seconds.

When saved as a .csv file using dlmwrite() with precision 16, then using load() took 1107 seconds.

When saved as -v7.3 .mat, then using load() of the 3796914266 bytes of file took 25 seconds.

When saved as a pure binary file, then fread(fid, [1e6 500],'*double') took 14 1/4 seconds the first time, and 2.1 seconds the second time (file in operating system cache.) fread(fid, [1 inf], '*double') takes 4.6 seconds when the file is in operating system cache, which tells us that there is more memory management overhead when the size is unknown.

(I will update as I generate more times.)

farzad 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715766

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715766

Thank you very much Walter

That is very much what's I was searching for. How do you save as mat?

Walter Roberson 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715787

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715787

在 MATLAB Online 中打开

data = rand(1e6, 50);
save testdata.mat data -v7.3

but this relies upon having the data in the first place to write out as .mat.

farzad 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715790

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715790

On the contrary if it's an excellent file with database put of matlab?

Walter Roberson 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715809

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715809

I am having difficulty creating a excel file that large. I wrote the file as .csv but my Excel complains about running out of memory when trying to import it, which does not make sense to me.

farzad 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715819

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715819

This is a new problem then

Walter Roberson 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715826

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715826

I have been updating the timings; you might want to have another look, above.

dpb 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715885

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715885

All of which continues to say "ditch Excel" entirely for such large files...

I do find it interesting that textscan manages to beat fscanf -- one would think would boil down to the same C runtime library call. Just out of curiosity, what were the two specific commands used, Walter? Oh--did you include overhead to cast the cell array from textscan to double?

Walter Roberson 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715980

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715980

编辑：Walter Roberson 2019-6-18

在 MATLAB Online 中打开

I created a format with repmat of '%f' 50 times. I fopen and then

datacell = textscan(fid, fmt, 'collectoutput', 1);

Because this puts everything into a single cell the overhead to extract the array is trivial.

The timing with collectoutput 0 without joining the columns after, was a hair higher but not statistically significant.

dpb 2019-6-18

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715983

⋮

链接

此评论的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file#comment_715983

Yeah, that's kinda' what I suspected, thanks for confirming, Walter.

I still find it more than strange that there's 30% reduction over fscanf -- what are they doing wrong with it then is the question that there's that much room for improvement?

These timings couldn't possibly be related to caching issues, I presume; you're too careful for that! :)

farzad 2019-6-20