Unable to read a huge XML or text file

3 次查看(过去 30 天)
Hi,
I have a XML of 2GB in size. I keep getting java heap memory error when loading it. So I am thinking of reading it in as a text file and remove many useless rows in that file before saving it into a new and smaller file.
How to do that? I cannot even read it with textpad. Thanks!

回答(1 个)

Santa Raghavan
Santa Raghavan 2017-7-26
编辑:Santa Raghavan 2017-7-26
The amount of Java Heap memory available to MATLAB can be increased and this can be done in the following way:
In the MATLAB Desktop Window:
For versions of MATLAB R2010a and above, use - File -> Preferences -> General -> Java Heap Memory. Move the slider to adjust the allocated heap memory.
For versions of MATLAB prior to R2010a, refer to the link below-
If that does not work, you can read it in as a text file using the textscan function by specifying the block size you wish to read at a time.
fileID = fopen('bigfile.txt');
formatSpec = '%s %f %*f %*f %s';
Read a block of data in the file. Use the HeaderLines name-value pair argument to instruct textscan to skip two lines before reading data.
D = textscan(fileID,formatSpec,'HeaderLines',2,'Delimiter','\t')
Refer for more info: Import large text files
  2 个评论
JFz
JFz 2017-7-27
Thank! I will try it. I have increased the java heap memory to the maximum but still got the same error.
Santa Raghavan
Santa Raghavan 2017-7-27
You can also try the datastore function that lets you read files that dont fit into the memory.
ds = datastore('Myfile.xml', ...
'TreatAsMissing','NA')
ds.ReadSize = 100; % Specifies the number of lines
% you want to read at a time.
read(ds) % Reads first 100 lines in file
read(ds) % Reads next 100 lines in file
Subsequent read calls on ds fetches data from last read point.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Large Files and Big Data 的更多信息

标签

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by