How can i create a tune PID Controller using Reinforcement Learning
12 次查看(过去 30 天)
显示 更早的评论
Hello i try to imporve that example of Tune PI Controller using Reinforcement Learning into a Tune PID Controller using Reinforcement Learning:
But it's not work, What was it that I did wrong?
I modified the simulink scheme watertankLQG.slx into a more simple schemecalled System_PID.slx and i create a script to calculate the parameter of PID (Kp-Ki-Kd-N) called ziegler_Nichols.m with techniques of Ziegler and Nichols
%% Esempio Z&N in anello chiuso
clear, close, clc
Ts=0.1;
Tf=30;
s = tf('s');
G= (5*(0.5-s))/(s+2)^3;
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
figure(1)
margin(G)
figure(2)
rlocus(G)
figure (3)
nyquistplot(G), grid on
%% Prova a ciclo chiuso
Kp = 1.9692308;
sysCL = feedback(Kp*G, 1)
figure(4)
margin(sysCL)
figure(5)
rlocus(sysCL)
figure(6)
step(sysCL);
% Osservazione: il diagramma di nyquist passa per il punto critico
% quando Kp = Ku
figure(7)
nyquist(Kp*G), grid on
KU = Kp;
%% Calcolo TU
[y,t] = step(sysCL,1:5e-3:10);
ii = find(abs(diff(y))<3e-5);
figure(8)
plot(t,y,'linewidth',2), hold on, grid on
plot(t(ii),y(ii),'or');
TU = min(diff(t(ii)))*2;
%% Taratura PID
Kp = 0.6*KU;
Ti = TU/2;
Td = TU/8;
Ki = Kp/Ti;
Kd = Kp*Td;
N = 10;
PI = Kp+(Ki/s)+((Kd*s)/(1+s*Td/N));
[y1,t1] = step(sysCL,0:1e-4:30);
sysCL2 = feedback(PI*G,1);
[y2,t2] = step(sysCL2,0:1e-4:30);
figure(9)
subplot(211)
plot(t1,y1,'linewidth',2);
title(['K_U = ' num2str(KU) ', T_U = ' num2str(TU)])
grid on
subplot(212)
plot(t2,y2,'linewidth',2);
title(['K_P = ' num2str(Kp) ', T_I = ' num2str(Ti) ', T_D = ' num2str(Td)])
grid on
figure(10)
margin(sysCL2)
figure(11)
bode(sysCL2)
figure(10)
rlocus(sysCL2)
figure(12)
nyquistplot(sysCL2)
Kp_Z_N=Kp
Ki_Z_N=Ki
Kd_Z_N=Kd
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp_Z_N))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_Z_N))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))




I copy and past the algorithm on web page: https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html and and I made some changes.
I increased the columns of obsInfo = rlNumericSpec([3 1]) and i add these lines to take the parameter of Kd from observation, to define N=10 and to put these values into PID of System_PID
s = tf('s');
G= (5*(0.5-s))/(s+2)^3
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
mdl = 'rl_PID_Tune';
open_system(mdl)
Ts = 0.1;
Tf = 10;
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObservations = obsInfo.Dimension(1);
numActions = prod(actInfo.Dimension);
rng(0)
initialGain = single([1e-3 2]);
actorNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','state')
fullyConnectedPILayer(initialGain, 'Action')];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'state'},'Action',{'Action'},actorOptions);
criticNetwork = localCreateCriticNetwork(numObservations,numActions);
criticOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation','state','Action','action',criticOpts);
critic2 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation','state','Action','action',criticOpts);
critic = [critic1 critic2];
agentOpts = rlTD3AgentOptions(...
'SampleTime',Ts,...
'MiniBatchSize',128, ...
'ExperienceBufferLength',1e6);
agentOpts.ExplorationModel.Variance = 0.1;
agentOpts.TargetPolicySmoothModel.Variance = 0.1;
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 100;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',100, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-355);
% Train the agent.
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
Kd = abs(parameters{1}(3))
N = 10;
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))
%% local Functions
function [env,obsInfo,actInfo] = localCreatePIDEnv(mdl)
% Define the observation specification obsInfo and action specification actInfo.
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error and error';
actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'PID output';
% Build the environment interface object.
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
% Set a cutom reset function that randomizes the reference values for the model.
env.ResetFcn = @(in)localResetFcn(in,mdl);
end
function in = localResetFcn(in,mdl)
% randomize reference signal
blk = sprintf([mdl '/Desired \nValue']);
hRef = 10 + 4*(rand-0.5);
in = setBlockParameter(in,blk,'Value',num2str(hRef));
% randomize initial height
hInit = 0;
blk = [mdl '/block system/System'];
in = setBlockParameter(in,blk,'InitialCondition',num2str(hInit));
end
function criticNetwork = localCreateCriticNetwork(numObservations,numActions)
statePath = [
featureInputLayer(numObservations,'Normalization','none','Name','state')
fullyConnectedLayer(32,'Name','fc1')];
actionPath = [
featureInputLayer(numActions,'Normalization','none','Name','action')
fullyConnectedLayer(32,'Name','fc2')];
commonPath = [
concatenationLayer(1,2,'Name','concat')
reluLayer('Name','reluBody1')
fullyConnectedLayer(32,'Name','fcBody')
reluLayer('Name','reluBody2')
fullyConnectedLayer(1,'Name','qvalue')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','concat/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
end
I modifed the scheme of 'rlwatertankPIDTune' and i called 'rl_PID.m'




But it's not work these is the error messages that the comand Window return to me:
Error using dlnetwork/initialize (line 481)
Invalid network.
Error in dlnetwork (line 218)
net = initialize(net, dlX{:});
Error in deep.internal.sdk.dag2dlnetwork (line 48)
dlnet = dlnetwork(lg);
Error in rl.util.createInternalModelFactory (line 15)
Model = deep.internal.sdk.dag2dlnetwork(Model);
Error in rlDeterministicActorRepresentation (line 86)
Model = rl.util.createInternalModelFactory(Model, Options, ObservationNames, ActionNames, InputSize, OutputSize);
Error in rl_PID (line 24)
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
Caused by:
Layer 'Action': Error using 'predict' in layer fullyConnectedPILayer. The function threw an error and could not be executed.
Error using dlarray/fullyconnect>iValidateWeights (line 221)
The number of weights (2) for each output feature must match the number of elements (3) in each observation of the input data.
Error in dlarray/fullyconnect (line 101)
wdata = iValidateWeights(W, xsize, batchDims);
Error in fullyConnectedPILayer/predict (line 21)
Z = fullyconnect(X, abs(obj.Weights), 0, 'DataFormat','CB');
What does it means?
4 个评论
S Saha
2023-8-25
What is the basis for selecting values for the above Initial Gains for Reinforcement Learning for PID tuning??
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 PID Controller Tuning 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
