Best way to deal with large data for deep learning?

3 次查看(过去 30 天)
Hi, I have been trying image classification with CNNs. I have some 350,000 images that I read and stored in a 4D matrix of size (170 x 170 x 3 x 350,000) in a data.mat file. I used matfile to keep adding new images to my data.mat file. The resultant file is almost 20GB
The problem now is that I cannot access the saved images because I run out of memory.
Do anyone have any suggestions for more efficient ways to build large data for deep learning?
One solution I can apply is to split the data and train two networks one with weights initialized by the others final weights, but I don't want to take that route!
  2 个评论
KSSV
KSSV 2016-6-22
You want to process the whole data (170 x 170 x 3 x 350,000) at once or you are using only one matrix (170X170X3) at one step?
Mona
Mona 2016-6-22
Yes, I am classifying the images using a CNN
trainNetworkm(Xtrain, Ytrain, opt)
Where Xtrain is supposed to contain all the training examples. So yes, I wish to pass the entire (170 X 170 X 3 X 350,000) to the network!

请先登录,再进行评论。

采纳的回答

Mona
Mona 2016-6-22
Ok, I found a way around it. Instead of reading/writing the images to .mat files, I used imageDatastore.
So what I did is, I processed all my images (resized them to 200 x 200 then took random crops of 170 x 170) and then wrote all the processed images to .jpg files.
Then, I used imageDatastore as:
imds = imageDatastore('F:\All_train_images','IncludeSubfolders',true,...
'FileExtensions','.jpg','LabelSource', 'foldernames');
and finally trained the network with
trainNetworkm(imds,layers,opt)
turned out that writing images to .jpg files is even faster and consumes less memory on disk than saving the .mat image files .
Thanks Dr. Siva Srinivas Kolukula for attempting to help!

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Recognition, Object Detection, and Semantic Segmentation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by