why using imageDataAugmenter doesn't increase the size of my training data set ?

Question

caesar 2017-11-7

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set

回答： Guy Reading 2019-9-27

I am trying to use imageDataAugmenter to increase the size of my training dataset (number of training images) but it seems like it has no effect at all. to explain : I had used simple CNN to classify an image from three categories. Each category has 200 images (120 training, 40 validation and 40 for testing). creating the imageDatastores:

*[TrainDataStore,ValDataStore,TestDatastore] = splitEachLabel(imds,0.6,0.2,'randomize'); *

training the network

*net = trainNetwork(TrainDataStore,mynet_1,options);*

so, as the number of Epochs and miniBatch are the same in all cases (5) and (60), I got 30 iterations and 6 iterations per epoch. 6 (iterations) * 60 (miniBatch)= 360 images (120 per each label).

I tried to use Data Augmentation and as follow :

 *augmenter = imageDataAugmenter('RandRotation',[0 30]);*
 *[TrainDataStore,ValDataStore,TestDatastore] = splitEachLabel(i_mds,0.6,0.2,'randomize');*

Traindatasource = augmentedImageSource([200 200 3],TrainDataStore,'DataAugmentation',augmenter);

net = trainNetwork(Traindatasource,mynet_1,options);

and again I ended up with (6) iterations per Epoch, 5 Epochs which means the total number of images is the same (360) even though it should be increased because we have a rotation property.

I don't know how the augmented data set size will be but its definately should be more than the original one. If there is something missing in my approach please let me know.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

J 2018-3-23

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set#answer_311587

I am guessing that when augmentation is on, it trains "exactly" the way it trained when it was turned off, but performs a random transformation (rotation, in your case) on each training example and presents that to the network instead of the original training example. The network rarely sees the exact same training example twice, as this is improbable given that the transformations are random, and it rarely sees exact copies of your actual training examples, since the random generator needs to rotate the image (in yours case) by an amount close enough to 0 degrees that it effectively doesn't rotate the image at all when discretization is accounted for, which is possible but improbable.

This is clearly different from what you were expecting; i.e., you expected it to generate one larger set of augmented training data, then break that up into mini batches that it presents to the network over and over again each Epoch, so that more iterations per Epoch would occur and it would see the same images each Epoch.

Unfortunately, this is not addressed in the R2017b documentation at least, and I doubt it is addressed in the 2018a as well. Your question is valid, and Mathworks should probably put more resources on their NN Toolbox for documentation and functionality if they want to be long-term players here.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Xu MingJie 2018-8-8

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set#answer_332034

Because. the data augmentation of imageDataAugmenter function is not the traditional increase of data in memory. It is supposed that your dataset is too big to allocate themselves in memory. Therefore, the staff of matlab utilize the idea of data augmentation and fit the limited memory of computers, as reference to this website: https://ww2.mathworks.cn/help/nnet/ug/preprocess-images-for-deep-learning.html#mw_ef499675-d7a0-4e77-8741-ea5801695193.

In more details, after you configure image transformation options, the size of training dataset is always same in each epoch. However, for each iteration of training, the augmented image datastore applies a random combination of transformations to the mini-batch of training data. Thus, in each epoch, the amount of training dataset is always same, but every training images have a little bit different caused by your transformation operations such as rotation.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 3

Guy Reading 2019-9-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/365694-why-using-imagedataaugmenter-doesn-t-increase-the-size-of-my-training-data-set#answer_393773

For all those still reading this: there is a solution!

I was making the same assumption as you, caesar. However, given J's answer, there's a work-around. If the network rarely sees the same training example twice, given what the augmenter does, we can just increase the number of epochs in trainingOptions:

https://uk.mathworks.com/help/deeplearning/ref/trainingoptions.html

That way, although we don't present the whole dataset within one epoch, we present something like the whole dataset in N number of epochs, where N is the multiple which we assumed the augmenter multipled our sample size by. If we increase the epoch number by N, we get something like what we expected in the first place, I believe (correct me if I'm wrong!)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

why using imageDataAugmenter doesn't increase the size of my training data set ?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（2 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

why using imageDataAugmenter doesn't increase the size of my training data set ?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（2 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论