Simulate trajectories through a POMDP. The start state for each trajectory is randomly chosen using the specified belief. The belief is used to choose actions from the the epsilon-greedy policy and then updated using observations.
belief: probability distribution over the states for choosing the starting states for the trajectories. Defaults to the start belief state specified in the model or "uniform".
horizon: number of epochs for the simulation. If NULL then the horizon for finite-horizon model is used. For infinite-horizon problems, a horizon is calculated using the discount factor.
epsilon: the probability of random actions for using an epsilon-greedy policy. Default for solved models is 0 and for unsolved model 1.
delta_horizon: precision used to determine the horizon for infinite-horizon problems.
digits: round probabilities for belief points.
return_beliefs: logical; Return all visited belief states? This requires n x horizon memory.
return_trajectories: logical; Return the simulated trajectories as a data.frame?
engine: 'cpp', 'r' to perform simulation using a faster C++ or a native R implementation.
verbose: report used parameters.
...: further arguments are ignored.
Returns
A list with elements:
avg_reward: The average discounted reward.
action_cnt: Action counts.
state_cnt: State counts.
reward: Reward for each trajectory.
belief_states: A matrix with belief states as rows.
trajectories: A data.frame with the episode id, time, the state of the simulation (simulation_state), the id of the used alpha vector given the current belief (see belief_states above), the action a and the reward r.
Details
Simulates n trajectories. If no simulation horizon is specified, the horizon of finite-horizon problems is used. For infinite-horizon problems with γ<1, the simulation horizon T is chosen such that the worst-case error is no more than δhorizon. That is
γTγRmax≤δhorizon,
where Rmax is the largest possible absolute reward value used as a perpetuity starting after T.
A native R implementation (engine = 'r') and a faster C++ implementation (engine = 'cpp') are available. Currently, only the R implementation supports multi-episode problems.
Both implementations support the simulation of trajectories in parallel using the package foreach. To enable parallel execution, a parallel backend like doparallel needs to be registered (see doParallel::registerDoParallel()). Note that small simulations are slower using parallelization. C++ simulations with n * horizon less than 100,000 are always executed using a single worker.
Examples
data(Tiger)# solve the POMDP for 5 epochs and no discountingsol <- solve_POMDP(Tiger, horizon =5, discount =1, method ="enum")sol
policy(sol)# uncomment the following line to register a parallel backend for simulation # (needs package doparallel installed)# doParallel::registerDoParallel()# foreach::getDoParWorkers()## Example 1: simulate 100 trajectoriessim <- simulate_POMDP(sol, n =100, verbose =TRUE)sim
# calculate the percentage that each action is used in the simulationround_stochastic(sim$action_cnt / sum(sim$action_cnt),2)# reward distributionhist(sim$reward)## Example 2: look at the belief states and the trajectories starting with # an initial start belief.sim <- simulate_POMDP(sol, n =100, belief = c(.5,.5), return_beliefs =TRUE, return_trajectories =TRUE)head(sim$belief_states)head(sim$trajectories)# plot with added density (the x-axis is the probability of the second belief state)plot_belief_space(sol, sample = sim$belief_states, jitter =2, ylim = c(0,6))lines(density(sim$belief_states[,2], bw =.02)); axis(2); title(ylab ="Density")## Example 3: simulate trajectories for an unsolved POMDP which uses an epsilon of 1# (i.e., all actions are randomized). The simulation horizon for the # infinite-horizon Tiger problem is calculated using delta_horizon. sim <- simulate_POMDP(Tiger, return_beliefs =TRUE, verbose =TRUE)sim$avg_reward
hist(sim$reward, breaks =20)plot_belief_space(sol, sample = sim$belief_states, jitter =2, ylim = c(0,6))lines(density(sim$belief_states[,1], bw =.05)); axis(2); title(ylab ="Density")