Reinforcement Learning Tools for Multi-Armed Bandit
Simulated-Based Inference (SBI)
Estimate Methods
Algorithm Packages
Behavior Rules
Column Names
Control Algorithm Behavior
Estimation Method: Recurrent Neural Network (RNN)
multiRL: Reinforcement Learning Tools for Multi-Armed Bandit
Model Parameters
plot.multiRL.replay
multiRL.output
multiRL.metric
Dataset Structure
The Engine of Approximate Bayesian Computation (ABC)
The Engine of Recurrent Neural Network (RNN)
Tool for Generating an Environment for Models
Likelihood-Based Inference (LBI)
Estimation Method: Maximum A Posteriori (MAP)
Estimation Method: Maximum Likelihood Estimation (MLE)
Estimation Method: Approximate Bayesian Computation (ABC)
Estimate Methods
Step 3: Optimizing parameters to fit real data
Function: Learning Rate
Function: Soft-Max
Function: Upper-Confidence-Bound
Function: –first, Greedy, Decreasing
Function: Utility Function
Function: Decay Rate
Core Functions
Policy of Agent
Density and Random Function
multiRL.input
multiRL.behrule
multiRL.record
multiRL.output
Step 2: Generating fake data for parameter and model recovery
Step 4: Replaying the experiment with optimal parameters
Risk Sensitive Model
Step 1: Building reinforcement learning model
Settings of Model
summary
Cognitive Processing System
Temporal Differences Model
Utility Model
A flexible general-purpose toolbox for implementing Rescorla-Wagner models in multi-armed bandit tasks. As the successor and functional extension of the 'binaryRL' package, 'multiRL' modularizes the Markov Decision Process (MDP) into six core components. This framework enables users to construct custom models via intuitive if-else syntax and define latent learning rules for agents. For parameter estimation, it provides both likelihood-based inference (MLE and MAP) and simulation-based inference (ABC and RNN), with full support for parallel processing across subjects. The workflow is highly standardized, featuring four main functions that strictly follow the four-step protocol (and ten rules) proposed by Wilson & Collins (2019) <doi:10.7554/eLife.49547>. Beyond the three built-in models (TD, RSTD, and Utility), users can easily derive new variants by declaring which variables are treated as free parameters.