accessors function

Access to Parts of the Model Description

Access to Parts of the Model Description

Functions to provide uniform access to different parts of the POMDP/MDP problem description.

start_vector(x) normalize_POMDP( x, sparse = TRUE, trans_start = FALSE, trans_function = TRUE, trans_keyword = FALSE ) normalize_MDP( x, sparse = TRUE, trans_start = FALSE, trans_function = TRUE, trans_keyword = FALSE ) reward_matrix( x, action = NULL, start.state = NULL, end.state = NULL, observation = NULL, episode = NULL, epoch = NULL, sparse = FALSE ) reward_val( x, action, start.state, end.state = NULL, observation = NULL, episode = NULL, epoch = NULL ) transition_matrix( x, action = NULL, start.state = NULL, end.state = NULL, episode = NULL, epoch = NULL, sparse = FALSE, trans_keyword = TRUE ) transition_val(x, action, start.state, end.state, episode = NULL, epoch = NULL) observation_matrix( x, action = NULL, end.state = NULL, observation = NULL, episode = NULL, epoch = NULL, sparse = FALSE, trans_keyword = TRUE ) observation_val( x, action, end.state, observation, episode = NULL, epoch = NULL )

Arguments

  • x: A POMDP or MDP object.
  • sparse: logical; use sparse matrices when the density is below 50% and keeps data.frame representation for the reward field. NULL returns the representation stored in the problem description which saves the time for conversion.
  • trans_start: logical; expand the start to a probability vector?
  • trans_function: logical; convert functions into matrices?
  • trans_keyword: logical; convert distribution keywords (uniform and identity) in transition_prob or observation_prob to matrices?
  • action: name or index of an action.
  • start.state, end.state: name or index of the state.
  • observation: name or index of observation.
  • episode, epoch: Episode or epoch used for time-dependent POMDPs. Epochs are internally converted to the episode using the model horizon.

Returns

A list or a list of lists of matrices.

Details

Several parts of the POMDP/MDP description can be defined in different ways. In particular, the fields transition_prob, observation_prob, reward, and start can be defined using matrices, data frames, keywords, or functions. See POMDP for details. The functions provided here, provide unified access to the data in these fields to make writing code easier.

Transition Probabilities T(ss,a)T(s'|s,a)

transition_matrix() accesses the transition model. The complete model is a list with one element for each action. Each element contains a states x states matrix with ss (start.state) as rows and ss' (end.state) as columns. Matrices with a density below 50% can be requested in sparse format (as a Matrix::dgCMatrix ).

Observation Probabilities O(os,a)O(o|s',a)

observation_matrix() accesses the observation model. The complete model is a list with one element for each action. Each element contains a states x observations matrix with ss (start.state) as rows and oo (observation) as columns. Matrices with a density below 50% can be requested in sparse format (as a Matrix::dgCMatrix )

Reward R(s,s,o,a)R(s,s',o,a)

reward_matrix() accesses the reward model. The preferred representation is a data.frame with the columns action, start.state, end.state, observation, and value. This is a sparse representation. The dense representation is a list of lists of matrices. The list levels are aa (action) and ss (start.state). The matrices have rows representing ss' (end.state) and columns representing oo (observations). The reward structure cannot be efficiently stored using a standard sparse matrix since there might be a fixed cost for each action resulting in no entries with 0.

Initial Belief

start_vector() translates the initial probability vector description into a numeric vector.

Convert the Complete POMDP Description into a consistent form

normalize_POMDP() returns a new POMDP definition where transition_prob, observations_prob, reward, and start are normalized.

Also, states, actions, and observations are ordered as given in the problem definition to make safe access using numerical indices possible. Normalized POMDP descriptions can be used in custom code that expects consistently a certain format.

Examples

data("Tiger") # List of |A| transition matrices. One per action in the from start.states x end.states Tiger$transition_prob transition_matrix(Tiger) transition_val(Tiger, action = "listen", start.state = "tiger-left", end.state = "tiger-left") # List of |A| observation matrices. One per action in the from states x observations Tiger$observation_prob observation_matrix(Tiger) observation_val(Tiger, action = "listen", end.state = "tiger-left", observation = "tiger-left") # List of list of reward matrices. 1st level is action and second level is the # start state in the form end state x observation Tiger$reward reward_matrix(Tiger) reward_matrix(Tiger, sparse = TRUE) reward_matrix(Tiger, action = "open-right", start.state = "tiger-left", end.state = "tiger-left", observation = "tiger-left") # Translate the initial belief vector Tiger$start start_vector(Tiger) # Normalize the whole model Tiger_norm <- normalize_POMDP(Tiger) Tiger_norm$transition_prob ## Visualize transition matrix for action 'open-left' plot_transition_graph(Tiger) ## Use a function for the Tiger transition model trans <- function(action, end.state, start.state) { ## listen has an identity matrix if (action == 'listen') if (end.state == start.state) return(1) else return(0) # other actions have a uniform distribution return(1/2) } Tiger$transition_prob <- trans # transition_matrix evaluates the function transition_matrix(Tiger)

See Also

Other POMDP: MDP2POMDP, POMDP(), actions(), add_policy(), plot_belief_space(), projection(), reachable_and_absorbing, regret(), sample_belief_space(), simulate_POMDP(), solve_POMDP(), solve_SARSOP(), transition_graph(), update_belief(), value_function(), write_POMDP()

Other MDP: MDP(), MDP2POMDP, MDP_policy_functions, actions(), add_policy(), gridworld, reachable_and_absorbing, regret(), simulate_MDP(), solve_MDP(), transition_graph(), value_function()

Author(s)

Michael Hahsler

  • Maintainer: Michael Hahsler
  • License: GPL (>= 3)
  • Last published: 2024-12-05