Simulating mini_batches with shallow NN train function

Question

0 个投票

I have large datasets. 200X840000 inputs and 6X840000 targets. How could I use Train to train on say 5000 at a time, across the entire data set and keep performance up across the whole data set so that I don't nessasarily have to worry about handling all of the data at once. Kind of like mocking the mini-batch technique of deep training but for shallow training on huge data sets. Below is what I have come up with.

rng('shuffle');
neurons = 12;
epochs = 2;
miniBatchSize = 3000;
miniBatch = single([]);
TrainI = [];
TrainT = [];
[AllTrain,AllTest] = dividerand(GenTogAllData, 0.91, 0.09);
net = fitnet(neurons);
net.trainfcn = 'trainscg';
net.trainParam.showWindow=1;
net.trainParam.epochs=1;
ii = 1;
k=1;
p =0;
    tic
    for i = 1:epochs
        p = p+1
        k = k+1;
        j = 1;
        ii = 1;
        randomNumbers = randperm(size(AllTrain,2));
        while j <=size(AllTrain,2)
            miniBatch(:,ii)=single(AllTrain(:,randomNumbers(j)));
            j = j+1;
            ii = ii+1;
            if size(miniBatch,2) == miniBatchSize
                TrainI = miniBatch(1:(size(AllTrain, 1)-parameters), :);
                TrainT = miniBatch((size(AllTrain, 1)-parameters+1):(size(AllTrain, 1)), :);
                net = train(net, TrainI, TrainT);
                miniBatch = [];
                TrainI = [];
                TrainT = [];
                ii = 1;
            elseif size(miniBatch) <= miniBatchSize
              end
          end

It runs much quicker per epoch, as I have them, vs an epoch of the entire data, but the best behavior I reach like this is never as good as when I allow the network to train on the entire data set for a long time. I know it is batch training one epoch at a time, you can easily try adapt as well and establish your own performance criteria and it still doesnt do as well.

Is there a fundamental reason why we might not be able to this? I will have more data than I can fit in RAM soon and want to achieve the performance I know the shallow NN can across the entire data set but in smaller batches. This relates to another question I asked in which I cant fit all of these 840000 on a gpu but can fit 300000. so then how would I train 300000 at a time and still keep performance across 840000??

I know I can use some Deep NN tools for help here, but I am bout to ask another question about how I might try that too, and want to keep this on how to use shallow NN to achieve this because I know the shallow NN performs on this data set, and the deep NN stuff is its own beast.

Thank you in advance for any help here.

2 个评论
显示无隐藏无

Greg Heath 2018-8-27

When you have very large datasets an excellent approach is to FIRST consider reduction of BOTH number and dimensionality.

Consider a 1-D Gaussian distribution. How many random draws are necessary for an acceptable estimate of its mean and covariance matrix? How does that change for 2 and 3-D ?

Hope this helps.

Greg

Harley Edwards 2018-8-27

So I know some PCA should help reduce memory draw, but I intend on scaling this up even further. I see what you mean I shouldn't need 840000 points to train on the data, but assume I have 840000 USEFUL points. How can I achieve the behavior in question of simulating minibatches and keeping acceptable behavior across the entire set?

To answer your question directly Greg, I found this link. https://stats.stackexchange.com/questions/59478/when-data-has-a-gaussian-distribution-how-many-samples-will-characterise-it

But I do not think it helps to achieve this simulated minibatch behavior.

Most genuinely, Thank you for your help.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question