How to reward after one simulation.

Question

I have a simple pendulum throw a ball and perform reinforcement learning so that the error from the target point becomes small.
The reward is the error (minus absolute value error between the target point and the arrival point of the sphere).
Instead of giving a reward (error) in all states, I want to give only the one with the smallest error (error is close to 0).
In other words, one simulation → calculate the error (reward) when throwing the ball in all observed states → give only the value with the smallest error as a reward.
I want to do it like this. Is there a way?

% Difine of step function
function [NextObservation, Reward, IsDone, LoggedSignals] = myStepfunction(Action,LoggedSignals,SimplePendulum)
global RR

      for i=1:200

       
        
        statePre = [-2*pi/3;0];
        statePre(1) = SimplePendulum.Theta;
        statePre(2) = SimplePendulum.AngularVelocity;
        
        IsDone = false;
       
        
       
        % updating states
        
        
        SimplePendulum.pstep(Action);
       
       
        
        
        
        
        state = [-2*pi/3;0];                             
        state(1) = SimplePendulum.Theta;               
        state(2) = SimplePendulum.AngularVelocity;     
     
       
       
        % cariclation of error (reward)
        
       
        Ball_Target = 20;
         
        Ball_Distance = Ballfunction(SimplePendulum);
        
       
        R =  -abs(Ball_Distance -Ball_Target); 
       
       
       
       teststep(R)  % get the Error (reward) in all observed states
        
       
        
       
         
         if (state(2) > 0) ||  (SimplePendulum.Y_Position < 0)  %|| (abs(state(2)) > 10)
             IsDone = true;
             [InitialObservation, LoggedSignal] = myResetFunction(SimplePendulum);
              LoggedSignal.State = [-pi ; 0];
              InitialObservation = LoggedSignal.State;
              state = InitialObservation; 
              SimplePendulum.Theta =-2*pi/3;
              SimplePendulum.AngularVelocity = 0;
              
              
       
         end
         
         if IsDone == true
             [M,I] = max(RR)
         end
          
        
     
      
      
      
        LoggedSignals.State = state;                    
       
        
      
        NextObservation = LoggedSignals.State;
      end
      
        
        Reward =  max(RR); % Gives the smallest error (reward)
        
       
        
       
        
     
            
        
            
  
end

How to reward after one simulation.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

How to reward after one simulation.

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论