Bayesian Optimization: How should we parameterize hidden units for changing number of layers (depth) of a BiLSTM network using bayesopt?
16 次查看(过去 30 天)
显示 更早的评论
Hi there,
I have been trying to use bayesian optimization for tuning my hyperparameters in my BiLSTM code (Hope this code helps some of the community because I saw unanswered questions on matlab related to LSTM bayesian optimization (similar to BiLSTM)).
In my code, one of the parameters I'm changing is depth of the BiLSTM network but, I should also try to find the best number of hidden units for each layer I think.
As you can see in the code, the maximum number of layers I want to test is 10 layers, so I created (HiddenUnits_1 --> HiddenUnits_10) under optimVars but, this number also depends on the number of layers we have in the network. For example: If a 5 layer (BiLSTM layers only) network needs to be adjusted, there should be 5 variables for hidden units (HiddenUnits_1 --> HiddenUnits_5) and the rest of the parameters (HiddenUnits_6 --> HiddenUnits_10) should not exist for that particular "experiment". I ran the code successfully but, it is trying to optimize for all 10 hidden units even if the layer size is smaller. Is there a way to avoid optimizing for unnecessary variables such as in this case (ignore hidden units 6-10 if there are only 5 layers in the current point being evaluated)?
Also, a little off topic but, related: Is there a way to optimize these hidden units in an array or a cell? Basically, can I write a cell array to be optimized with each cell being the different hidden units variables (HiddenUnits_1-HiddenUnits_10)? The reason I want to see if this is possible is becase I can modify the code to accept hidden units automatically from a cell array and I will not have to mention each hidden unit separetely because I can make that number dependent on the number of BiLSTM layers I believe (not tried it yet).
Thank you, any help or suggestions are appreciated.
Here is the code I have written for it:
%% Bayesian Optimization
optimVars = [
optimizableVariable('SectionDepth',[1 10],'Type','integer')
optimizableVariable('InitialLearnRate',[1e-2 1],'Transform','log')
optimizableVariable('HiddenUnits_1', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_2', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_3', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_4', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_5', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_6', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_7', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_8', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_9', [1 200], 'Type', 'integer')
optimizableVariable('HiddenUnits_10', [1 200], 'Type', 'integer')];
ObjFcn = makeObjFcn(Noisy_XTrain_PLE,Noisy_YTrain_PLE,PLE_Predictions_40_train,PLE_Predictions_40_test,mu_PLE,std_PLE);
% Perform bayesian optimization by minimizing error on validation set.
% Minimum of 30 runs is suggested for bayesian optimization (more can lead to better results).
BayesObject = bayesopt(ObjFcn,optimVars, ...
'MaxObj',30, ...
'MaxTime',14*60*60, ...
'IsObjectiveDeterministic',false, ...
'UseParallel',false);
% Load the best network found in optimization and load the filename
bestIdx = BayesObject.IndexOfMinimumTrace(end);
fileName = BayesObject.UserDataTrace{bestIdx};
savedStruct = load(fileName);
% Print validation error
TrainError = savedStruct.TotaltrainingError
valError = savedStruct.TotalvalError
%% Define the objective function for optimization
function ObjFcn = makeObjFcn(XTrain,YTrain,PLE_Predictions_training,PLE_Predictions_test,mu_PLE,std_PLE)
ObjFcn = @valErrorFun;
function [TotalvalError,cons,fileName] = valErrorFun(optVars)
% Create cell array of valError to save the validation error values
valError = cell(510,1);
TrainingError = cell(510,1);
% Random seed
seed = 100;
rng(seed);
% Input - Output features
numFeatures = 1;
numResponses = 1;
% Hyperparameters
miniBatchSize = 1;
%numHiddenUnits = 50;
x = 0;
y = 1;
maxEpochs = 1;
% Layer structure
layers = [
sequenceInputLayer(numFeatures)
bilstmBlock(optVars.SectionDepth,optVars.HiddenUnits_1,optVars.HiddenUnits_2,optVars.HiddenUnits_3,optVars.HiddenUnits_4,optVars.HiddenUnits_5,optVars.HiddenUnits_6,optVars.HiddenUnits_7,optVars.HiddenUnits_8,optVars.HiddenUnits_9,optVars.HiddenUnits_10,x,y) % Function
dropoutLayer(0)
% Add the fully connected layer and the final softmax and
% classification layers.
fullyConnectedLayer(numResponses,'BiasInitializer','ones','WeightsInitializer',@(sz) normrnd(x,y,sz))
regressionLayer];
% Training options
options = trainingOptions('adam', ...
'InitialLearnRate',optVars.InitialLearnRate, ...
'GradientThreshold',1, ...
'MaxEpochs',maxEpochs, ...
'ExecutionEnvironment','gpu', ...
'LearnRateSchedule','piecewise', ...
'LearnRateDropPeriod',125, ...
'LearnRateDropFactor',1, ...
'MiniBatchSize',miniBatchSize, ...
'Shuffle','never', ...
'Verbose',false, ...
'Plots','training-progress');
% Train network
net = trainNetwork(XTrain, YTrain, layers, options);
% Forecast future values
for i = 450:510
net = resetState(net); % Testing this reset option
[net,XPred] = predictAndUpdateState(net,XTrain(i,:),'MiniBatchSize', 1);
Ending = cellfun(@(x) x(end), YTrain(i,:), 'UniformOutput', false);
% Then Update the state again on the last point of Ytrain to get the next state update
[net,YPred] = predictAndUpdateState(net,Ending,'MiniBatchSize',1);
% Repeat the predictAndUpdateState in a for loop to get the next time steps (Forecast into the future)
for j = 2:40 % Need to change this to account for remaining months for each well
[net,YPred(:,j)] = predictAndUpdateState(net,YPred(:,j-1),'MiniBatchSize', 1,'ExecutionEnvironment','gpu');
end
% Convert cell to matrix since the amount of predictions is the same (not the total amount for each well but, the next 5 years for example)
YPred_new = cell2mat(YPred);
mu_3 = cell2mat(mu_PLE);
std_3 = cell2mat(std_PLE);
De_normalized_YPred = YPred_new.*std_3(i,:) + mu_3(i,:);
De_normalized_Xpred = cellfun(@(x,y,z) x.*y + z, std_PLE (i,1), XPred, mu_PLE (i,1), 'UniformOutput', false);
% Test PLE
PLE_test = cell2mat(PLE_Predictions_test(i,1));
% Training PLE
PLE_Predictions_train = cellfun(@(x) x(:,end-1), PLE_Predictions_training, 'UniformOutput', false);
PLE_train = cell2mat(PLE_Predictions_train(i,1));
valError{i,1} = mean((PLE_test(1,1:40) - De_normalized_YPred).^2);
TrainingError{i,1} = mean((PLE_train(1,:) - cell2mat(De_normalized_Xpred(:))).^2);
end
TotaltrainingError = sum([TrainingError{:}]);
TotalvalError = sum([valError{:}]);
fileName = num2str(TotaltrainingError) + "_" + num2str(TotalvalError) + ".mat";
save(fileName,'net','TotalvalError','TotaltrainingError','options','layers')
% Constraints
cons = [];
end
end
%% Define a function for creating deeper networks
function layersan = bilstmBlock(numBiLSTMLayers,HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10,x,y)
numHiddenUnits = [HiddenUnits_1,HiddenUnits_2,HiddenUnits_3,HiddenUnits_4,HiddenUnits_5,HiddenUnits_6,HiddenUnits_7,HiddenUnits_8,HiddenUnits_9,HiddenUnits_10];
layersan = [];
for i = 1:numBiLSTMLayers
layers = bilstmLayer(numHiddenUnits(1,i),'BiasInitializer','ones','OutputMode','sequence','InputWeightsInitializer',@(sz) normrnd(x,y,sz),'RecurrentWeightsInitializer',@(sz) normrnd(x,y,sz));
layersan = [layersan; layers];
end
end
0 个评论
采纳的回答
Alan Weiss
2020-11-3
I believe that you can perform the optimization the way you want using conditional constraints. If M is the number of layers that you are using, then set the values of all parameters in layers M+1 through 10 to some default value so that they are not optimized.
As for your second question, I am sorry but I do not understand exactly what you are asking. Maybe you are asking if you can run a subsidiary optimization inside your objective function. The answer to that is of course yes, you can write anything you want inside your objective function, including another call to bayesopt. But perhaps i misunderstand what you are asking.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
4 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Sequence and Numeric Feature Data Workflows 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!