ql function

Q-learning for Estimating Optimal DTRs

Q-learning for Estimating Optimal DTRs

This function implements Q-learning for estimating general K-stage DTRs. Lasso penalty can be applied for variable selection at each stage.

ql(H, AA, RR, K, pi='estimated', lasso=TRUE, m=4)

Arguments

  • H: subject history information before treatment for all subjects at the K stages. It can be a vector or a matrix when only baseline information is used in estimating the DTR; otherwise, it would be a list of length K. Please standardize all the variables in H to have mean 0 and standard deviation 1 before using H as the input. See details for how to construct H.
  • AA: observed treatment assignments for all subjects at the K stages. It is a vector if K=1, or a list of K vectors corresponding to the K stages.
  • RR: observed reward outcomes for all subjects at the K stages. It is a vector if K=1, or a list of K vectors corresponding to the K stages.
  • K: number of stages
  • pi: treatment assignment probabilities of the observed treatments for all subjects at the K stages. It is a vector if K=1, or a list of K vectors corresponding to the K stages. It can be a user specified input if the treatment assignment probabilities are known. The default is pi="estimated", that is we estimate the treatment assignment probabilities based on lasso-penalized logistic regressions with HkH_k being the predictors at each stage k.
  • lasso: specifies whether to add lasso penalty at each stage when fitting the model. The default is lasso=TRUE.
  • m: number of folds in the m-fold cross validation. It is used when res.lasso=T is specified. The default is m=4.

Details

A patient's history information prior to the treatment at stage k can be constructed recursively as Hk=(Hk1,Ak1,Rk1,Xk)H_k = (H_{k-1}, A_{k-1}, R_{k-1}, X_k) with H1=X1H_1=X_1, where XkX_k is subject-specific variables collected at stage k just prior to the treatment, AkA_k is the treatment at stage kk, and RkR_k is the outcome observed post the treatment at stage kk. Higher order or interaction terms can also be easily incorporated in HkH_k, e.g., Hk=(Hk1,Ak1,Rk1,Xk,Hk1Ak1,Rk1Ak1,XkAk1)H_k = (H_{k-1}, A_{k-1},R_{k-1}, X_k, H_{k-1}A_{k-1}, R_{k-1}A_{k-1}, X_kA_{k-1}).

Returns

A list of results is returned as an object. It contains the following attributes: - stage1: a list of stage 1 results, ...

  • stageK: a list of stage K results

  • valuefun: overall empirical value function under the estimated DTR

  • benefit: overall empirical benefit function under the estimated DTR

  • pi: treatment assignment probabilities of the assigned treatments for each subject at the K stages. If pi='estimated' is specified as input, the estimated treatment assignment probabilities from lasso-penalized logistic regressions will be returned.

In each stage's result, a list is returned which consists of - co: the estimated coefficients of (1,H,A,HA)(1, H, A, H*A), the variables in the model at this stage

  • treatment: the estimated optimal treatment at this stage for each subject in the sample. If no tailoring variables are selected under lasso penalty, treatment will be assigned randomly with equal probability.

  • Q: the estimated optimal outcome increment from this stage to the end (the estimated optimal Q-function at this stage) for each subject in the sample

References

Watkins, C. J. C. H. (1989). Learning from delayed rewards (Doctoral dissertation, University of Cambridge).

Qian, M., & Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Annals of statistics, 39(2), 1180.

Author(s)

Yuan Chen, Ying Liu, Donglin Zeng, Yuanjia Wang

Maintainer: Yuan Chen yc3281@columbia.eduirene.yuan.chen@gmail.com

See Also

predict.ql, sim_Kstage, owl

Examples

# simulate 2-stage training and test sets n_train = 100 n_test = 500 n_cluster = 10 pinfo = 10 pnoise = 20 train = sim_Kstage(n_train, n_cluster, pinfo, pnoise, K=2) H1_train = scale(train$X) H2_train = scale(cbind(H1_train, train$A[[1]], H1_train * train$A[[1]])) pi_train = list(rep(0.5, n_train), rep(0.5, n_train)) test = sim_Kstage(n_test, n_cluster, pinfo, pnoise, train$centroids, K=2) H1_test = scale(test$X) H2_test = scale(cbind(H1_test, test$A[[1]], H1_test * train$A[[1]])) pi_test = list(rep(0.5, n_test), rep(0.5, n_test)) ql_train = ql(H=list(H1_train, H2_train), AA=train$A, RR=train$R, K=2, pi=pi_train, m=3) ql_test = predict(ql_train, H=list(H1_test, H2_test), AA=test$A, RR=test$R, K=2, pi=pi_test)
  • Maintainer: Yuan Chen
  • License: GPL-2
  • Last published: 2020-04-22

Useful links