Hello,
When you want to perform inference on an RL policy, there is no need to consider rewards. The trained policy already knows internally that the actions taken are the right ones.
If you are asking whether you can perform RL training on the raspberry pi, this is not currently supported.