Main Content

rlCustomEvaluator

Custom object for evaluating reinforcement learning agents during training

Since R2023b

    Description

    Create an rlCustomEvaulator object to specify a custom function and evaluation frequency that you want to use to evaluate agents during training. To train the agents, pass this object to train.

    For more information on training agents, see Train Reinforcement Learning Agents.

    Creation

    Description

    evaluator = rlCustomEvaluator(evalFcn) returns the custom evaluator object evaluator. The evalFcn argument is a handle to your custom MATLAB® evaluation function.

    example

    evaluator = rlCustomEvaluator(evalFcn,EvaluationFrequency=evalPeriod) also specifies the number of training episodes after which train calls the evaluation function.

    Properties

    expand all

    Custom evaluation function, specified as a function handle. The train function calls evalFcn after evalPeriod episodes.

    Your evaluation function must have three inputs and three outputs, as illustrated by the following signature.

    [statistic, scores, data] = myEvalFcn(agent, environment, trainingInfo)

    Given an agent, its environment, and training episode information, the custom evaluation function runs a number of evaluation episodes and returns a corresponding summarizing statistic, a vector of episode scores, and any additional data that might be needed for logging.

    The required input arguments (passed to evalFcn from train) are:

    • agent — Agent to evaluate, specified as a reinforcement learning agent object. For multiagent environments, this is a cell array of agent objects.

    • environment — Environments within which the agents are evaluated, specified as a reinforcement environment object.

    • trainingInfo — A structure containing the following fields.

      • episodeIndex — Current episode index, specified as a positive integer

      • episodeInfo — A structure containing the fields CumulativeReward, StepsTaken, and InitialObservation, which contain, respectively, the cumulative reward, the number of steps taken, and the initial observations of the current training episode

    The output arguments (passed from evalFcn to train) are:

    • statistic — A statistic computed from a group of consecutive evaluation episodes. Common statistics are the mean, medium, maximum, and minimum. At the end of the training, this value is returned by train as the element of the EvaluationStatistics vector corresponding to the last training episode.

    • scores — A vector of episode scores from each evaluation episode. You can use a logger object to store this argument during training.

    • data — Any additional data from evaluation that you might find useful, for example for logging purposes. You can use a logger object to store this argument during training.

    To use additional input arguments beyond the allowed two, define your additional arguments in the MATLAB workspace, then specify stepFcn as an anonymous function that in turn calls your custom function with the additional arguments defined in the workspace, as shown in the example Create Custom Environment Using Step and Reset Functions.

    Example: evalFcn=@myEvalFcn

    Evaluation period, specified as a positive integer. It is the number of episodes after which NumEpisodes evaluation episodes are run. For example, if EvaluationFrequency is 100 and NumEpisodes is 3 then three evaluation episodes are run, consecutively, after 100 training episodes. The default is 100.

    Example: EvaluationFrequency=200

    Object Functions

    Examples

    collapse all

    Create an rlcustomEvaluator object to evaluate an agent during training using a custom evaluation function. Use the function myEvaluationFcn, defined at the end of this example.

    myEvaluator = rlCustomEvaluator(@myEvaluationFcn)
    myEvaluator = 
      rlCustomEvaluator with properties:
    
              EvaluationFcn: @myEvaluationFcn
        EvaluationFrequency: 100
    
    

    Configure the evaluator to run the evaluation function every 200 training episodes.

    myEvaluator.EvaluationFrequency = 200;

    To evaluate an agent during training using these evaluation options, pass myEvaluator to train, as in the following code example.

    results = train(agent, env, rlTrainingOptions(), Evaluator=myEvaluator);
    

    For more information see train.

    Custom Evaluation Function

    The evaluation function is called by train every evaluator.EvaluationFrequency training episodes. Within the evaluation function, if the number of training episodes is up to 1000, run just one evaluation episode; otherwise, run 10 consecutive evaluation episodes. Configure the agent to use a greedy policy (no exploration) during evaluation, and return the eight largest episode reward as final statistic (this is consistent with achieving a desired reward 80% of the time).

    function  [statistic, scores, data] = ...
        myEvaluationFcn(agent, env, trainingEpisodeInfo)
    
        % Do not use an exploration policy for evaluation.
        agent.UseExplorationPolicy = false;
        
        % Set the number of consecutive evaluation episodes to run.
        if trainingEpisodeInfo.EpisodeIndex <= 1000
            numEpisodes = 1;
        else
            numEpisodes = 10;
        end
        
        % Initialize the rewards and data arrays.
        episodeRewards = zeros(numEpisodes, 1);
        data = cell(numEpisodes, 1);
        
        % Run numEpisodes consecutive evaluation episodes.
        for evaluationEpisode = 1:numEpisodes
        
            % Use a fixed random seed for reproducibility.
            rng(evaluationEpisode*10)
        
            % Run one evaluation episode. The output is a structure
            % containing various agent simulation information,
            % as described in runEpisode.
            episodeResults = runEpisode(env, agent, ...
                MaxSteps=500, ...
                CleanupPostSim=false);
        
            if isa(episodeResults,"rl.env.Future")
        
                % For parallel simulation, fetch data from workers.
                [~,out] = fetchNext(episodeResults);
        
                % Collect the episode cumulative reward.
                episodeRewards(evaluationEpisode) = ...
                    out.AgentData.EpisodeInfo.CumulativeReward;
        
                % Collect the whole data structure.
                data{evaluationEpisode} = out;
        
            else
        
                % Collect the episode cumulative reward.
                episodeRewards(evaluationEpisode) = ...
                    episodeResults.AgentData.EpisodeInfo.CumulativeReward;
                data{evaluationEpisode} = episodeResults;
            end
        end
        
        % Return the eight largest episode reward if 10 episodes
        % are run, otherwise return just the greatest (and only) reward.
        statistic = sort(episodeRewards);
        if length(statistic) == 10
            statistic = statistic(8);
        else
            % Make sure to always return a scalar in any case.
            statistic = statistic(end);
        end
        
        % Return the rewards vector.
        scores = episodeRewards;
    
    end
    

    Version History

    Introduced in R2023b