Create a dynamic function for solving bandit problem.

Pablo Tano

2015 4 21

0 个回答

4 次查看（30 天）

0 个投票

Hello, I want to evaluate (not maximize) the function that is inside of the brackets in the image, for the most simple case of N=1. To do this, apparently it is required to use dynamic programming: evaluating first the last term (which is fixed in (s + alpha) /(s + f + alpha + beta)) then the previous one, and so on; as shown in the function.

I wrote this code, but is not working. I do not know how to define functions in this way, this is what I was able to do:

    % code
function [ out ] = future_expected_reward(s,f,alpha,beta,k,l)
if k==l %"l" is the game length
     out = (s + alpha) /(s + f + alpha + beta);
else
       out = ( (s + alpha) /(s + f + alpha + beta) ) * future_expected_reward(s+1,f,alpha,beta,k+1,l) + ...
           ((f + beta) /(s + f + alpha + beta)) * future_expected_reward(s,f+1,alpha,beta,k+1,l);
   end
end

I want to evaluate the function at trial "k", of a total of "l" trials, with "alpha" and "beta" fixed (and since N=1 for my case, you should ignore the i's).

I really need you help! Thanks!!