Working with big tall arrays in an HPC environment

Question

Sebastian 2024-12-9

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2171784-working-with-big-tall-arrays-in-an-hpc-environment

回答： Mike Croucher 2024-12-9

Hi all,

I have been using since not so long ago tall arrays to process large data arrays with relatively good results. Currently, I have 25 arrays of 40,000x35,677 each saved in a separate file that I want to feed to a training algorithm using a HPC. I already did this training satisfactorily using only 3 sets out of these 25. However when I use 25 arrays the HPC crashes, giving an out-of-memory message.

I am assigning 37 cores with 20 GB of RAM in each of them, which in theory would be enough to even bring this array to the physical memory. I decided not to do this because, in later stages of the code, I use multiprocessing (parfor loops) where each processor has to work independently with large chunks of data. I found that this speeds up the code signifficantly, so I would rather keep a similar memory allocation.

I isolated the problem to a self-contained case which is detailed as follows:

% Case to test the OOM problem
% I am loading the data into tall arrays and trying to process it
% I got a OOM problem when bringing a few of the observations to memory
% File path and number of files that I will load
folder_path = '/some/random/folder/path/'; %Linux
file_name = 'training_data';
end_num = 25; % Number of files
% Obtaining the files names in cell array
file_cell = cell(end_num,1);
data_per_file = zeros(end_num,1);
for i=1:end_num
    file_cell{i} = strcat(folder_path,file_name,'_',num2str(i),'_out.mat');
    file_data = whos('-file',strcat(folder_path,file_name,'_',num2str(i),'_out.mat'),'data');
    data_per_file(i) = file_data.size(1);
end
P = sum(data_per_file); % Number of observations
N = file_data.size(2); % Spectral resolution
% Create a file set to load in fileDatastore
fs = matlab.io.datastore.FileSet(file_cell);
ds = fileDatastore(fs,"ReadFcn",@load_atm_db);
% Create tall array and randomize observations order
Ncores = 36;
parpool(Ncores)
A = cell2mat(tall(ds)); % Create tall array with all the observations
selectedBlocks = A(randperm(P),:); % Randomize the order of the observations
% Obtain initial dictionary atoms
disp('Calculating initial dictionary')
Natoms = 40000;
Dictionary = gather(selectedBlocks(1:Natoms,:));
disp('Finished running the code')

The "@load_atm_db" function is detailed below, which I use because the arrays in the .mat files are inside struct constructions.

function data = load_atm_db(filepath)
%load_hyp_db Load the database in a datastore
%   Load the data array into a datastore without having to pass through a struct
struct_array = load(filepath);
data = struct_array.data;
end

The code crashes when it reaches the "gather" function. From the log file, I see that the gather function finishes loading the data (does 3 passes and all of them at 100%) and then crashes. The code that I use for the HPC SLURM environment is the following:

#!/bin/bash
#SBATCH --partition=queue
#SBATCH --job-name=Matlab_batch
#SBATCH --nodes=1
#SBATCH --cpus-per-task=37
#SBATCH --mem-per-cpu=20Gb
module add matlab/2024a
cd /another/random/folder/path/
matlab -nodisplay -r self_contained_case_test -logfile /still/another/random/folder/outputSelfContainedCase.out

Is there something that I can do better to avoid this out-of-memory error?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Mike Croucher 2024-12-9

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2171784-working-with-big-tall-arrays-in-an-hpc-environment#answer_1555259

在 MATLAB Online 中打开

Hi Sebastian

I can't comment on the Tall Array situation right now but I'm zooming in on this comment:

I am assigning 37 cores with 20 GB of RAM in each of them, which in theory would be enough to even bring this array to the physical memory. I decided not to do this because, in later stages of the code, I use multiprocessing (parfor loops) where each processor has to work independently with large chunks of data. I found that this speeds up the code signifficantly, so I would rather keep a similar memory allocation.

If you have enough physical RAM, you should be able to ditch tall arrays completely. To get around the parfor memory issue, use a Threads pool. E.g. to create one with 8 workers:

parpool("Threads",8)
Starting parallel pool (parpool) using the 'Threads' profile ...
Connected to parallel pool with 8 workers.
ans = 
 ThreadPool with properties: 

           NumWorkers: 8
                 Busy: false
            FileStore: [1x1 parallel.FileStore]
           ValueStore: [1x1 parallel.ValueStore]

This uses shared memory wherever possible and so memory requirements are generally lower. More details on the choice between pool types are at Choose Between Thread-Based and Process-Based Environments.

Also, I wonder if your application could make use of single precision? Your matrices, if full rather than sparse, would be 10.6Gb each in double

sizeDouble  = 40000*35677*8/(1024^3)
sizeDouble = 10.6326

but only half that in single since it would be 4 byes per entry instead of 8. Of course there might be (serious!) numerical issues if you do this but it works in many applications. Deep learning for example generally does fine in single precision.

Finally why 37 cores? Seems a strange number!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Working with big tall arrays in an HPC environment

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

标签

产品

版本

Community Treasure Hunt

Working with big tall arrays in an HPC environment

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论