How can i create a tune PID Controller using Reinforcement Learning

Question

0 个投票

Hello i try to imporve that example of Tune PI Controller using Reinforcement Learning into a Tune PID Controller using Reinforcement Learning:

https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html

But it's not work, What was it that I did wrong?

I modified the simulink scheme watertankLQG.slx into a more simple schemecalled System_PID.slx and i create a script to calculate the parameter of PID (Kp-Ki-Kd-N) called ziegler_Nichols.m with techniques of Ziegler and Nichols

%% Esempio Z&N in anello chiuso
clear, close, clc
Ts=0.1;
Tf=30;
s = tf('s');
G= (5*(0.5-s))/(s+2)^3;
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
figure(1)
margin(G)
figure(2)
rlocus(G)
figure (3)
nyquistplot(G), grid on
%% Prova a ciclo chiuso
Kp = 1.9692308;
sysCL = feedback(Kp*G, 1)
figure(4)
margin(sysCL)
figure(5)
rlocus(sysCL)
figure(6)
step(sysCL);
% Osservazione: il diagramma di nyquist passa per il punto critico
% quando Kp = Ku
figure(7)
nyquist(Kp*G), grid on
KU = Kp;
%% Calcolo TU
[y,t] = step(sysCL,1:5e-3:10);
ii = find(abs(diff(y))<3e-5);
figure(8)
plot(t,y,'linewidth',2), hold on, grid on
plot(t(ii),y(ii),'or');
TU = min(diff(t(ii)))*2;
%% Taratura PID
Kp = 0.6*KU;
Ti = TU/2;
Td = TU/8;
Ki = Kp/Ti;
Kd = Kp*Td;
N = 10;
PI = Kp+(Ki/s)+((Kd*s)/(1+s*Td/N));
[y1,t1] = step(sysCL,0:1e-4:30);
sysCL2 = feedback(PI*G,1);
[y2,t2] = step(sysCL2,0:1e-4:30);
figure(9)
subplot(211)
plot(t1,y1,'linewidth',2);
title(['K_U = ' num2str(KU) ', T_U = ' num2str(TU)])
grid on
subplot(212)
plot(t2,y2,'linewidth',2);
title(['K_P = ' num2str(Kp) ', T_I = ' num2str(Ti) ', T_D = ' num2str(Td)])
grid on
figure(10)
margin(sysCL2)
figure(11)
bode(sysCL2)
figure(10)
rlocus(sysCL2)
figure(12)
nyquistplot(sysCL2)
Kp_Z_N=Kp
Ki_Z_N=Ki
Kd_Z_N=Kd
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp_Z_N))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_Z_N))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))

I copy and past the algorithm on web page: https://it.mathworks.com/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html and and I made some changes.

I increased the columns of obsInfo = rlNumericSpec([3 1]) and i add these lines to take the parameter of Kd from observation, to define N=10 and to put these values into PID of System_PID

s = tf('s');
G= (5*(0.5-s))/(s+2)^3
a= [1 6 12 8]
b = [-5 2.5]
[A,B,C,D] = tf2ss(b,a)
mdl = 'rl_PID_Tune';
open_system(mdl)
Ts = 0.1;
Tf = 10;
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObservations = obsInfo.Dimension(1);
numActions = prod(actInfo.Dimension);
rng(0)
initialGain = single([1e-3 2]);
actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedPILayer(initialGain, 'Action')];
actorOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},'Action',{'Action'},actorOptions);
criticNetwork = localCreateCriticNetwork(numObservations,numActions);
criticOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
critic1 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state','Action','action',criticOpts);
critic2 = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation','state','Action','action',criticOpts);
critic = [critic1 critic2];
agentOpts = rlTD3AgentOptions(...
    'SampleTime',Ts,...
    'MiniBatchSize',128, ...
    'ExperienceBufferLength',1e6);
agentOpts.ExplorationModel.Variance = 0.1;
agentOpts.TargetPolicySmoothModel.Variance = 0.1;
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 100;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes, ...
    'MaxStepsPerEpisode',maxsteps, ...
    'ScoreAveragingWindowLength',100, ...
    'Verbose',false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-355);
% Train the agent.
trainingStats = train(agent,env,trainOpts);
simOpts = rlSimulationOptions('MaxSteps',maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
Kd = abs(parameters{1}(3))
N = 10; 
mdlTest = 'System_PID';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
set_param([mdlTest '/PID Controller'],'D',num2str(Kd_Z_N))
set_param([mdlTest '/PID Controller'],'N',num2str(N))
%% local Functions
function [env,obsInfo,actInfo] = localCreatePIDEnv(mdl)
% Define the observation specification obsInfo and action specification actInfo.
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error and error';
actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'PID output';
% Build the environment interface object.
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
% Set a cutom reset function that randomizes the reference values for the model.
env.ResetFcn = @(in)localResetFcn(in,mdl);
end
function in = localResetFcn(in,mdl)
% randomize reference signal
blk = sprintf([mdl '/Desired \nValue']);
hRef = 10 + 4*(rand-0.5);
in = setBlockParameter(in,blk,'Value',num2str(hRef));
% randomize initial height
hInit = 0;
blk = [mdl '/block system/System'];
in = setBlockParameter(in,blk,'InitialCondition',num2str(hInit));
end
function criticNetwork = localCreateCriticNetwork(numObservations,numActions)
statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','state')
    fullyConnectedLayer(32,'Name','fc1')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','action')
    fullyConnectedLayer(32,'Name','fc2')];
commonPath = [
    concatenationLayer(1,2,'Name','concat')
    reluLayer('Name','reluBody1')
    fullyConnectedLayer(32,'Name','fcBody')
    reluLayer('Name','reluBody2')
    fullyConnectedLayer(1,'Name','qvalue')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'fc1','concat/in1');
criticNetwork = connectLayers(criticNetwork,'fc2','concat/in2');
end

I modifed the scheme of 'rlwatertankPIDTune' and i called 'rl_PID.m'

But it's not work these is the error messages that the comand Window return to me:

Error using dlnetwork/initialize (line 481)
Invalid network.
Error in dlnetwork (line 218)
                net = initialize(net, dlX{:});
Error in deep.internal.sdk.dag2dlnetwork (line 48)
    dlnet = dlnetwork(lg);
Error in rl.util.createInternalModelFactory (line 15)
                Model = deep.internal.sdk.dag2dlnetwork(Model);
Error in rlDeterministicActorRepresentation (line 86)
Model = rl.util.createInternalModelFactory(Model, Options, ObservationNames, ActionNames, InputSize, OutputSize);
Error in rl_PID (line 24)
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
Caused by:
    Layer 'Action': Error using 'predict' in layer fullyConnectedPILayer. The function threw an error and could not be executed.
        Error using dlarray/fullyconnect>iValidateWeights (line 221)
        The number of weights (2) for each output feature must match the number of elements (3) in each observation of the input data.
        Error in dlarray/fullyconnect (line 101)
        wdata = iValidateWeights(W, xsize, batchDims);
        Error in fullyConnectedPILayer/predict (line 21)
                    Z = fullyconnect(X, abs(obj.Weights), 0, 'DataFormat','CB');

What does it means?

4 个评论
显示 2更早的评论隐藏 2更早的评论

S Saha 2023-8-25

What is the basis for selecting values for the above Initial Gains for Reinforcement Learning for PID tuning??

Dario Di Francesco 2023-8-25

Sorry guys but you response me a little bit late :D. I solve all these problem and i did it for bachelor thesis 2 years ago! If you are intersted about these work and other work about Control and machine learning technique about controll contact me on these email: difrancescodario95@gmail.com or see my linkedin profile https://www.linkedin.com/in/dario-di-francesco-89390a142/ and i pass you these work as soon as possibile after i traslate it in english. As soon as possibile i will start my website with all my project. See you later!

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

How can i create a tune PID Controller using Reinforcement Learning

4 个评论
显示 2更早的评论隐藏 2更早的评论

回答（0 个）

类别

标签

Community Treasure Hunt

How can i create a tune PID Controller using Reinforcement Learning

4 个评论 显示 2更早的评论 隐藏 2更早的评论

回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论