Create a dynamic function for solving bandit problem.

Hello, I want to evaluate (not maximize) the function that is inside of the brackets in the image, for the most simple case of N=1. To do this, apparently it is required to use dynamic programming: evaluating first the last term (which is fixed in (s + alpha) /(s + f + alpha + beta)) then the previous one, and so on; as shown in the function.
I wrote this code, but is not working. I do not know how to define functions in this way, this is what I was able to do:
% code
function [ out ] = future_expected_reward(s,f,alpha,beta,k,l)
if k==l %"l" is the game length
out = (s + alpha) /(s + f + alpha + beta);
else
out = ( (s + alpha) /(s + f + alpha + beta) ) * future_expected_reward(s+1,f,alpha,beta,k+1,l) + ...
((f + beta) /(s + f + alpha + beta)) * future_expected_reward(s,f+1,alpha,beta,k+1,l);
end
end
I want to evaluate the function at trial "k", of a total of "l" trials, with "alpha" and "beta" fixed (and since N=1 for my case, you should ignore the i's).
I really need you help! Thanks!!

回答(0 个)

类别

帮助中心File Exchange 中查找有关 Video games 的更多信息

提问:

2015-4-21

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by