policy function

Extract the Policy from a POMDP/MDP

Extract the Policy from a POMDP/MDP

Extracts the policy from a solved POMDP/MDP.

policy(x, drop = TRUE)

Arguments

  • x: A solved POMDP or MDP object.
  • drop: logical; drop the list for converged, epoch-independent policies.

Returns

A list with the policy for each epoch. Converged policies have only one element. If drop = TRUE then the policy is returned without a list.

Details

A list (one entry per epoch) with the optimal policy. For converged, infinite-horizon problems solutions, a list with only the converged solution is produced. For a POMDP, the policy is a data.frame consisting of:

  • Part 1: The alpha vectors for the belief states (defines also the utility of the belief). The columns have the names of states.
  • Part 2: The last column named action contains the prescribed action.

For an MDP, the policy is a data.frame with columns for:

  • state: The state.
  • U: The state's value (discounted expected utility U) if the policy is followed
  • action: The prescribed action.

Examples

data("Tiger") # Infinite horizon sol <- solve_POMDP(model = Tiger) sol # policy with value function, optimal action and transitions for observations. policy(sol) plot_value_function(sol) # Finite horizon (we use incremental pruning because grid does not converge) sol <- solve_POMDP(model = Tiger, method = "incprune", horizon = 3, discount = 1) sol policy(sol) # Note: We see that it is initially better to listen till we make # a decision in the final epoch. # MDP policy data(Maze) sol <- solve_MDP(Maze) policy(sol)

See Also

Other policy: estimate_belief_for_nodes(), optimal_action(), plot_belief_space(), plot_policy_graph(), policy_graph(), projection(), reward(), solve_POMDP(), solve_SARSOP(), value_function()

Author(s)

Michael Hahsler

  • Maintainer: Michael Hahsler
  • License: GPL (>= 3)
  • Last published: 2024-12-05