Can I use rlfinitesetspec for a multi-observation system where the number of combinations is arbitrarily large?

6 次查看(过去 30 天)
I am attempting to solve a reinforcement learning problem. My environment consists of a 4x4 grid game board, so 16 entries, where each square can have one of four states, 0, 1, 2, or 3. The position of the board is used as the observation, so an example board might be
0 2 0 1
1 3 1 2
0 1 2 3
3 1 2 0
and then that would get converted into an observation as [0 2 0 1 1 3 1 2 0 1 2 3 3 1 2 0]. I can't tell from the documentation of rlnumericspec and rlfinitsetspec if this kind of system should be treated as discrete-observation or if I am required to use continuous-observation. The continuous implementation is intuitive to me, I can use rlnumericspec(16, 1). However, intuitively I would expect that when possible it is better to use a discrete specification to fix the number of states that need to be considered, remove the need for interpolation, etc.
In my case, because the possible states of each grid position is known and fixed it would be possible to write out an array of all possible board positions but if my combinatorics is right there are 4^16 possible board states which is...a lot. From my reading of the documentation of rlfinitesetspec, in order to use a discrete specification I would need to write out a cell array containing each possible board state. If that's correct, what would be the easiest way to generate that array? I'm aware of perms() but I don't think it's applicable here. If I'm not correct, I can't tell from the examples in the documentation if this kind of observation space can (or indeed should) be expressed using rlfinitesetspec, and that's what I'd like to confirm with this question.
Thanks in advance

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Strategy & Logic 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by