Tests and confidence intervals for the parameters estimated by the msel function
Tests and confidence intervals for the parameters estimated by the msel function
This function conducts various statistical tests and calculates confidence intervals for the parameters of the model estimated via the msel function.
test_msel( object, fn, fn_args = list(), test ="t", method ="classic", ci ="classic", cl =0.95, se_type ="dm", trim =0, vcov = object$cov, iter =100, generator = rnorm, bootstrap =NULL, par_ind =1:object$control_lnL$n_par, eps = max(1e-04, sqrt(.Machine$double.eps)*10), n_sim =1000, n_cores =1)
Arguments
object: an object of class 'msel'. It also may be a list of two objects. Then object[[1]] and object[[2]] are supplied to the arguments model1 and model2 of the lrtest_msel function.
fn: a function which returns a numeric vector and should depend on the elements of object. These elements should be accessed via coef.msel or predict.msel functions. The first argument of fn should be an object. Therefore coef and predict functions in fn should also depend on object.
fn_args: a list of additional arguments of fn.
test: a character representing the test to be used. If test = "t" then t-test is used. If test = "wald" then Wald test is applied.
method: a character representing a method used to conduct a test. If test = "t" or test = "wald" and method = "classic"
then p-values are calculated by using the quantiles of the standard normal distribution. If test = "t" or test = "wald" and method = "bootstrap"
then p-values are calculated by using the bootstrap as described in Hansen (2022). If test = "wald" and method = "score" then score bootstrap Wald test of P. Kline and A. Santos (2012) is used.
ci: a character representing the type of the confidence interval used. Available only if test = "t". If ci = "classic" then quantiles of the standard normal distribution are used to build an asymptotic confidence interval. If ci = "percentile" then percentile bootstrap interval is applied. If ci = "bc" then the function constructs a bias-corrected percentile bootstrap confidence interval of Efron (1982) as described in Hansen (2022).
cl: a numeric value between 0 and 1 representing a confidence level of the confidence interval.
se_type: a character representing a method used to estimate the standard errors of the outputs of fn. If se_type = "dm" then delta method is used. If se_type = "bootstrap" then bootstrap is applied.
trim: a numeric value between 0 and 1 representing the share of bootstrap estimates to be nullified when standard errors are estimated for se_type = "bootstrap".
vcov: an estimate of the asymptotic covariance matrix of the parameters of the model.
iter: the number of iterations used by the score bootstrap Wald test.
generator: function which is used by the score bootstrap to generate random weights. It should have an argument n representing the number of random weights to generate. Other arguments are ignored.
bootstrap: an object of class 'bootstrap_msel' which is an output of the bootstrap_msel function. This object is used to retrieve the estimates of the bootstrap samples.
par_ind: a vector of indexes of the model parameters used in the calculation of fn. If only necessary indexes are included then in some cases estimation time may greatly decrease. Set ind = TRUE
in summary.msel to see the indexes of the model parameters. If eps is a vector then eps[i] determines the increment used to differentiate fn respect to the parameter with par_ind[i]-th index.
eps: a positive numeric value representing the increment used for the numeric differentiation of fn. It may also be a numeric vector such that eps[i] is an increment used to differentiate the fn respect to the par_ind[i]-th parameter of the model. Set ind = TRUE in summary.msel, to see the indexes of the model parameters. If eps[i] = 0 then derivative of fn respect to par_ind[i]-th parameter is assumed to be zero.
n_sim: the value passed to the n_sim argument of the msel function.
n_cores: the value passed to the n_cores argument of the msel function.
Returns
This function returns an object of class 'test_msel' which is a list. It may have the following elements:
tbl - a list with the elements described below.
is_bootstrap - a logical value which equals TRUE if bootstrap has been used.
is_ci - a logical value which equals TRUE if confidence intervals were used.
test - the same as the input argument test.
method - the same as the input argument method.
se_type - the same as the input argument method.
ci - the same as the input argument ci.
cl - the same as the input argument cl.
iter - the same as the input argument iter.
n_bootstrap - an integer representing the number of the bootstrap iterations used.
n_val - the length of the vector returned by fn.
A list tbl may have the following elements:
val - an output of the fn function.
se - a numeric vector such that se[i] represents a standard error associated with val[i].
p_value - a numeric vector of p-values.
lwr - a numeric vector such that lwr[i] is the realization of the lower (left) bound of the confidence interval for the true value of val[i].
upr - a numeric vector such that upr[i] is the realization of the upper (right) bound of the confidence interval for the true value of val[i].
stat - a numeric vector of values of the test statistics.
An object of class 'test_msel' has an implementation of the
summary method
summary.test_msel.
In a special case when object is a list of length 2 the function returns an object of class 'lrtest_msel' since the function lrtest_msel is called internally.
Details
Suppose that θ is a vector of parameters of the model estimated via the msel function and g(θ) is a differentiable function representing fn which returns a m-dimensional vector of real values:
g(θ)=(g1(θ),...,gm(θ))T.
Classic and bootstrap t-test
If test = "t" then for each j∈{1,...,m} the following hypotheses is tested:
H0:gj(θ)=0,H1:gj(θ)=0.
The test statistic is:
T=gj(θ^)/σ^j,
where σ^ is a standard error of gj(θ^).
If se_type = "dm" then delta method is used to estimate this standard error:
σ^j=∇gj(θ^)TAs.Cov(θ^)∇gj(θ^),
where ∇gj(θ^) is a gradient as a column vector and the estimate of the asymptotic covariance matrix of the estimates As.Cov(θ^) is provided via the vcov
argument. Numeric differentiation is used to calculate ∇gj(θ^).
If se_type = "bootstrap" then bootstrap is applied to estimate the standard error:
σ^j=B−11b=1∑B(gj(θ^(b))−gj(θ^(b)))2,
where B is the number of the bootstrap iterations bootstrap$iter, θ^(b) is the estimate associated with the b-th of these iterations bootstrap$par[b, ], and gj(θ^(b)) is a sample mean of the bootstrap estimates:
gj(θ^(b))=B1b=1∑Bgj(θ^(b)).
If method = "classic" it is assumed that if the null hypothesis is true then the asymptotic distribution of the test statistic is standard normal. This distribution is used for the calculation of the p-value:
p−value=2min(Φ(T),1−Φ(T)),
where Φ() is a cumulative distribution function of the standard normal distribution.
If method = "bootstrap" then p-value is calculated via the bootstrap as suggested by Hansen (2022):
p−value=B1b=1∑BI(∣Tb−T∣>∣T∣),
where Tb=gj(θ^(b))/σ^j is the value of the test statistic estimated on the b-th bootstrap sample and I(q) is an indicator function which equals 1 when q is a true statement and 0 - otherwise.
Classic and bootstrap Wald test
Suppose that method = "classic" or method = "bootstrap". If test = "wald" then the null hypothesis is:
H0:⎩⎨⎧g1(θ)=0g2(θ)=0⋮gm(θ)=0.
The alternative hypothesis is that there is such j∈{1,...,m} that:
H1:gj(θ)=0.
The test statistic is:
T=g(θ^)TAs.Cov(g(θ^))−1g(θ^),
where As.Cov(g(θ^)) is the estimate of the asymptotic covariance matrix of g(θ^).
If se_type = "dm" then delta method is used to estimate this matrix:
As.Cov(g(θ^))=g′(θ^)As.Cov(θ^)g′(θ^)T,
where g′(θ^) is a Jacobian matrix. A numeric differentiation is used to calculate its elements:
g′(θ^)ij=∂θj∂gi(θ)∣θ=θ^.
If se_type = "bootstrap" then bootstrap is used to estimate this matrix:
If method = "classic" then it is assumed that if the null hypothesis is true then the asymptotic distribution of the test statistic is chi-squared with m degrees of freedom. This distribution is used for the calculation of the p-value:
p−value=1−Fm(T),
where Fm is a cumulative distribution function of the chi-squared distribution with m degrees of freedom.
If method = "bootstrap" then p-value is calculated via the bootstrap as suggested by Hansen (2022):
p−value=B1b=1∑BI(Tb>T),
where:
Tb=sbTAs.Cov(g(θ^))−1sb,sb=g(θ^(b))−g(θ^).
Score bootstrap Wald test
If method = "score" and test = "Wald" then score bootstrap Wald test of Kline and Santos (2012) is used.
Consider B independent samples of n independent identically distributed random weights with zero mean and unit variance. Let wib denote the i-th weight of the b-th sample. Argument generator is used to supply a function which generates these weights wib and iter argument represents B. Also n is the number of observations in the model object$other$n_obs.
Let J denote a matrix of sample scores object$J. Further, denote by Jb a matrix such that its b-th row is a product of the wib and the b-th row of J. Also, denote by H a matrix of mean values of the derivatives of sample scores respect to the estimated parameters object$H.
In addition consider the following notations:
A=g′(θ)H−1,Sb=AJb(c),
where Jb(c) is a vector of the column sums of Jb.
The test statistic is as follows:
T=g(θ^)T(ACov(J)AT)−1g(θ^)/n,
where Cov(J) is a sample covariance matrix of the sample scores of the model cov(object$J).
The test statistic on the b-th bootstrap sample is similar:
Tb=ST(ACov(Jb)AT)−1S/n.
The p-value is estimated as follows:
p−value=B1b=1∑BI(Tb≥T).
Confidence intervals
If test = "t" then the function also returns the realizations of the lower and upper bounds of the 100×cl percent symmetric asymptotic confidence interval of gj(θ).
If ci = "classic" then classic confidence interval is used which assumes asymptotic normality of gj(θ^):
where zq is a q-th quantile of the standard normal distribution and cl is a confidence level cl. The method used to estimate σ^j depends on the se_type argument as described above.
If ci = "percentile" then percentile bootstrap confidence interval is used. Therefore the sample quantiles of gj(θ^(b))
are used as the realizations of the lower and upper bounds of the confidence interval.
If ci = "bc" then bias corrected percentile bootstrap confidence interval of Efron (1982) is used as described in Hansen (2022). The default percentile bootstrap confidence interval uses sample quantiles of levels (1−cl)/2 and 1−(1−cl)/2. Bias corrected version uses the sample quantiles of the following levels:
If se_type = "bootstrap" and trim > 0 then trimming is used as described in Hansen (2022) to estimate σ^j and As.Cov(g(θ^)). The algorithm is as follows. First, nullify 100trim percent of g(θ^(b)) with the greatest values of the L2-norm of qb (defined above). Then use this 'trimmed' sample to estimate the standard error and the asymptotic covariance matrix.
Examples
# -------------------------------# CPS data example# -------------------------------# Set seed for reproducibilityset.seed(123)# Upload the datadata(cps)# Estimate the employment modelmodel <- msel(work ~ age + I(age ^2)+ bachelor + master, data = cps)summary(model)# Use Wald test to test the hypothesis that age has no # effect on the conditional probability of employment:# H0: coef age = 0# coef age ^ 2 = 0age_fn <-function(object){ lwage_coef <- coef(object, type ="coef")[[1]] val <- c(lwage_coef["age"], lwage_coef["I(age^2)"]) return(val)}age_test <- test_msel(object = model, fn = age_fn, test ="wald")summary(age_test)# Use t-test to test for each individual the hypothesis:# P(work = 1 | x) = 0.8prob_fn <-function(object){ prob <- predict(object, group =1, type ="prob") val <- prob -0.8 return(val)}prob_test <- test_msel(object = model, fn = prob_fn, test ="t")summary(prob_test)# -------------------------------# Simulated data example# Model with continuous outcome# and ordinal selection# -------------------------------# ---# Step 1# Simulation of the data# ---# Set seed for reproducibilityset.seed(123)# Load required packagelibrary("mnorm")# The number of observationsn <-10000# Regressors (covariates)s1 <- runif(n = n, min =-1, max =1)s2 <- runif(n = n, min =-1, max =1)s3 <- runif(n = n, min =-1, max =1)s4 <- runif(n = n, min =-1, max =1)# Random errorssigma <- matrix(c(1,0.4,0.45,0.7,0.4,1,0.54,0.8,0.45,0.54,0.81,0.81,0.7,0.8,0.81,1), nrow =4)errors <- mnorm::rmnorm(n = n, mean = c(0,0,0,0), sigma = sigma)u1 <- errors[,1]u2 <- errors[,2]eps0 <- errors[,3]eps1 <- errors[,4]# Coefficientsgamma1 <- c(-1,2)gamma2 <- c(1,1)gamma1_het <- c(0.5,-1)beta0 <- c(1,-1,1,-1.2)beta1 <- c(2,-1.5,0.5,1.2)# Linear index of the ordinal equations# mean partli1 <- gamma1[1]* s1 + gamma1[2]* s2
li2 <- gamma2[1]* s1 + gamma2[2]* s3
# variance partli1_het <- gamma1_het[1]* s2 + gamma1_het[2]* s3
# Linear index of the continuous equation# regime 0li_y0 <- beta0[1]+ beta0[2]* s1 + beta0[3]* s3 + beta0[4]* s4
# regime 1li_y1 <- beta1[1]+ beta1[2]* s1 + beta1[3]* s3 + beta1[4]* s4
# Latent variablesz1_star <- li1 + u1 * exp(li1_het)z2_star <- li2 + u2
y0_star <- li_y0 + eps0
y1_star <- li_y1 + eps1
# Cutscuts1 <- c(-1)cuts2 <- c(0,1)# Observable ordinal outcome# firstz1 <- rep(0, n)z1[z1_star > cuts1[1]]<-1# secondz2 <- rep(0, n)z2[(z2_star > cuts2[1])&(z2_star <= cuts2[2])]<-1z2[z2_star > cuts2[2]]<-2z2[z1 ==0]<-NA# Observable continuous outcomey <- rep(NA, n)y[which(z2 ==0)]<- y0_star[which(z2 ==0)]y[which(z2 !=0)]<- y1_star[which(z2 !=0)]y[which(z1 ==0)]<-NA# Datadata <- data.frame(s1 = s1, s2 = s2, s3 = s3, s4 = s4, z1 = z1, z2 = z2, y = y)# ---# Step 2# Estimation of the parameters# ---# Assign the groupsgroups <- matrix(c(1,2,1,1,1,0,0,-1), byrow =TRUE, ncol =2)groups2 <- matrix(c(1,1,0,-1), ncol =1)# Estimate the modelmodel <- msel(list(z1 ~ s1 + s2 | s2 + s3, z2 ~ s1 + s3), list(y ~ s1 + s3 + s4), groups = groups, groups2 = groups2, data = data)# ---# Step 3# Hypotheses testing# ---# Use t-test to test for each observation the hypothesis# H0: P(z1 = 0, z2 = 2 | Xi) = 0 prob02_fn <-function(object){ val <- predict(object, group = c(1,0)) return(val)}prob02_test <- test_msel(object = model, fn = prob02_fn, test ="t")summary(prob02_test)# Use t-test to test the hypothesis# H0: E(y1|z1=0, z2=2) - E(y0|z1=0, z2=2)ATE_fn <-function(object){ val1 <- predict(object, group = c(0,2), group2 =1) val0 <- predict(object, group = c(0,2), group2 =0) val <- mean(val1 - val0) return(val)}ATE_test <- test_msel(object = model, fn = ATE_fn)summary(ATE_test)# Use Wald to test the hypothesis# H0: beta1 = beta0coef_fn <-function(object){ coef1 <- coef(object, regime =1, type ="coef2") coef0 <- coef(object, regime =0, type ="coef2") coef_difference <- coef1 - coef0
return(coef_difference)}coef_test <- test_msel(object = model, fn = coef_fn, test ="wald")summary(coef_test)# Use t-test to test for each 'k' the hypothesis# H0: beta1k = beta0kcoef_test2 <- test_msel(object = model, fn = coef_fn, test ="t")summary(coef_test2)# Use Wald test to test the hypothesis# H0: beta11 + beta12 - 0.5 = 0# beta11 * beta13 - beta03 = 0test_fn <-function(object){ coef1 <- coef(object, regime =1, type ="coef2") coef0 <- coef(object, regime =0, type ="coef2") val <- c(coef1[1]+ coef1[2]-0.5, coef1[1]* coef1[3]- coef0[3]) return(val)}# classic Wald testwald1 <- test_msel(object = model, fn = test_fn, test ="wald", method ="classic")summary(wald1)# score bootstrap Wald testwald2 <- test_msel(object = model, fn = test_fn, test ="wald", method ="score")summary(wald2)# Replicate the latter test with the 2-step estimatormodel2 <- msel(list(z1 ~ s1 + s2 | s2 + s3, z2 ~ s1 + s3), list(y ~ s1 + s3 + s4), groups = groups, groups2 = groups2, data = data, estimator ="2step")# classic Wald testwald1_2step <- test_msel(object = model2, fn = test_fn, test ="wald", method ="classic")summary(wald1_2step)# score bootstrap Wald testwald2_2step <- test_msel(object = model2, fn = test_fn, test ="wald", method ="score")summary(wald2_2step)
References
B. Efron (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Society for Industrial and Applied Mathematics.
B. Hansen (2022). Econometrics. Princeton University Press.
P. Kline, A. Santos (2012). A Score Based Approach to Wild Bootstrap Inference. Journal of Econometric Methods, vol. 67, no. 1, pages 23-41.