reward function

Calculate the Reward for a POMDP Solution

Calculate the Reward for a POMDP Solution

This function calculates the expected total reward for a POMDP solution given a starting belief state. The value is calculated using the value function stored in the POMDP solution. In addition, the policy graph node that represents the belief state and the optimal action can also be returned using reward_node_action().

reward(x, belief = NULL, epoch = 1, ...) reward_node_action(x, belief = NULL, epoch = 1, ...)

Arguments

  • x: a solved POMDP object.
  • belief: specification of the current belief state (see argument start in POMDP for details). By default the belief state defined in the model as start is used. Multiple belief states can be specified as rows in a matrix.
  • epoch: return reward for this epoch. Use 1 for converged policies.
  • ...: further arguments are passed on.

Returns

reward() returns a vector of reward values, one for each belief if a matrix is specified.

reward_node_action() returns a list with the components - belief_state: the belief state specified in belief.

  • reward: the total expected reward given a belief and epoch.

  • pg_node: the policy node that represents the belief state.

  • action: the optimal action.

Details

The reward is typically calculated using the value function (alpha vectors) of the solution. If these are not available, then simulate_POMDP() is used instead with a warning.

Examples

data("Tiger") sol <- solve_POMDP(model = Tiger) # if no start is specified, a uniform belief is used. reward(sol) # we have additional information that makes us believe that the tiger # is more likely to the left. reward(sol, belief = c(0.85, 0.15)) # we start with strong evidence that the tiger is to the left. reward(sol, belief = "tiger-left") # Note that in this case, the total discounted expected reward is greater # than 10 since the tiger problem resets and another game staring with # a uniform belief is played which produces additional reward. # return reward, the initial node in the policy graph and the optimal action for # two beliefs. reward_node_action(sol, belief = rbind(c(.5, .5), c(.9, .1))) # manually combining reward with belief space sampling to show the value function # (color signifies the optimal action) samp <- sample_belief_space(sol, n = 200) rew <- reward_node_action(sol, belief = samp) plot(rew$belief[,"tiger-right"], rew$reward, col = rew$action, ylim = c(0, 15)) legend(x = "top", legend = levels(rew$action), title = "action", col = 1:3, pch = 1) # this is the piecewise linear value function from the solution plot_value_function(sol, ylim = c(0, 10))

See Also

Other policy: estimate_belief_for_nodes(), optimal_action(), plot_belief_space(), plot_policy_graph(), policy(), policy_graph(), projection(), solve_POMDP(), solve_SARSOP(), value_function()

Author(s)

Michael Hahsler

  • Maintainer: Michael Hahsler
  • License: GPL (>= 3)
  • Last published: 2024-12-05