(Restricted) Maximum Likelihood Estimation with Prior Distributions and Penalty Functions under Multivariate Normality
(Restricted) Maximum Likelihood Estimation with Prior Distributions and Penalty Functions under Multivariate Normality
The mlnormal estimates statistical model for multivariate normally distributed outcomes with specified mean structure and covariance structure (see Details and Examples). Model classes include multilevel models, factor analysis, structural equation models, multilevel structural equation models, social relations model and perhaps more.
The estimation can be conducted under maximum likelihood, restricted maximum likelihood and maximum posterior estimation with prior distribution. Regularization (i.e. LASSO penalties) is also accomodated.
mlnormal(y, X, id, Z_list, Z_index, beta=NULL, theta, method="ML", prior=NULL, lambda_beta=NULL, weights_beta=NULL, lambda_theta=NULL, weights_theta=NULL, beta_lower=NULL, beta_upper=NULL, theta_lower=NULL, theta_upper=NULL, maxit=800, globconv=1e-05, conv=1e-06, verbose=TRUE, REML_shortcut=NULL, use_ginverse=FALSE, vcov=TRUE, variance_shortcut=TRUE, use_Rcpp=TRUE, level=0.95, numdiff.parm=1e-04, control_beta=NULL, control_theta=NULL)## S3 method for class 'mlnormal'summary(object, digits=4, file=NULL,...)## S3 method for class 'mlnormal'print(x, digits=4,...)## S3 method for class 'mlnormal'coef(object,...)## S3 method for class 'mlnormal'logLik(object,...)## S3 method for class 'mlnormal'vcov(object,...)## S3 method for class 'mlnormal'confint(object, parm, level=.95,...)
Arguments
y: Vector of outcomes
X: Matrix of covariates
id: Vector of identifiers (subjects or clusters, see Details)
Z_list: List of design matrices for covariance matrix (see Details)
Z_index: Array containing loadings of design matrices (see Details). The dimensions are units × matrices × parameters.
beta: Initial vector for β
theta: Initial vector for θ
method: Estimation method. Can be either "ML" or "REML".
prior: Prior distributions. Can be conveniently specified in a string which is processed by the function prior_model_parse. Only univariate prior distributions can be specified.
lambda_beta: Parameter λβ for penalty function P(β)=λβ∑hwβh∣βh∣
weights_beta: Parameter vector wβ for penalty function P(β)=λβ∑hwβh∣βh∣
lambda_theta: Parameter λθ for penalty function c("P(boldtheta)=lambdaboldtheta\n", "sumhwboldthetah∣thetah∣")
weights_theta: Parameter vector wθ for penalty function c("P(boldtheta)=lambdaboldtheta\n", "sumhwboldthetah∣thetah∣")
beta_lower: Vector containing lower bounds for β parameter
beta_upper: Vector containing upper bounds for β parameter
theta_lower: Vector containing lower bounds for θ parameter
theta_upper: Vector containing upper bounds for θ parameter
maxit: Maximum number of iterations
globconv: Convergence criterion deviance
conv: Maximum parameter change
verbose: Print progress?
REML_shortcut: Logical indicating whether computational shortcuts should be used for REML estimation
use_ginverse: Logical indicating whether a generalized inverse should be used
vcov: Logical indicating whether a covariance matrix of θ parameter estimates should be computed in case of REML (which is computationally demanding)
variance_shortcut: Logical indicating whether computational shortcuts for calculating covariance matrices should be used
use_Rcpp: Logical indicating whether the Rcpp package should be used
level: Confidence level
numdiff.parm: Numerical differentiation parameter
control_beta: List with control arguments for β estimation. The default is
list( maxiter=10, conv=1E-4, ridge=1E-6).
control_theta: List with control arguments for θ estimation. The default is
list( maxiter=10, conv=1E-4, ridge=1E-6).
object: Object of class mlnormal
digits: Number of digits used for rounding
file: File name
parm: Parameter to be selected for confint method
...: Further arguments to be passed
x: Object of class mlnormal
Details
The data consists of outcomes yi and covariates Xi
for unit i. The unit can be subjects, clusters (like schools) or the full outcome vector. It is assumed that yi is normally distributed as N(μi,Vi) where the mean structure is modelled as
μi=Xiβ
and the covariance structure Vi depends on a parameter vector θ. More specifically, the covariance matrix Vi is modelled as a sum of functions of the parameter θ and known design matrices Zim for unit i (m=1,…,M). The model is
Vi=m=1∑MZimγimwithγim=h=1∏Hθhqimh
where qimh are non-negative known integers specified in Z_index and Zim are design matrices specified in Z_list.
The estimation follows Fisher scoring (Jiang, 2007; for applications see also Longford, 1987; Lee, 1990; Gill & Swartz, 2001) and the regularization approach is as described in Lin, Pang and Jiang (2013) (see also Krishnapuram, Carin, Figueiredo, & Hartemink, 2005).
Returns
List with entries
theta: Estimated θ parameter
beta: Estimated β parameter
theta_summary: Summary of θ parameters
beta_summary: Summary of β parameters
coef: Estimated parameters
vcov: Covariance matrix of estimated parameters
ic: Information criteria
V_list: List with fitted covariance matrices Vi
V1_list: List with inverses of fitted covariance matrices Vi
prior_args: Some arguments in case of prior distributions
...: More values
References
Gill, P. S., & Swartz, T. B. (2001). Statistical analyses for round robin interaction data. Canadian Journal of Statistics, 29, 321-331. tools:::Rd_expr_doi("10.2307/3316080")
Jiang, J. (2007). Linear and generalized linear mixed models and their applications. New York: Springer.
Krishnapuram, B., Carin, L., Figueiredo, M. A., & Hartemink, A. J. (2005). Sparse multinomial logistic regression: Fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 957-968. tools:::Rd_expr_doi("10.1109/TPAMI.2005.127")
Lee, S. Y. (1990). Multilevel analysis of structural equation models. Biometrika, 77, 763-772. tools:::Rd_expr_doi("10.1093/biomet/77.4.763")
Lin, B., Pang, Z., & Jiang, J. (2013). Fixed and random effects selection by REML and pathwise coordinate optimization. Journal of Computational and Graphical Statistics, 22, 341-355. tools:::Rd_expr_doi("10.1080/10618600.2012.681219")
Longford, N. T. (1987). A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika, 74, 817-827. tools:::Rd_expr_doi("10.1093/biomet/74.4.817")
See Also
See lavaan, sem, lava, OpenMx or nlsem
packages for estimation of (single level) structural equation models.
See the regsem
and lsl packages for regularized structural equation models.
See lme4 or nlme package for estimation of multilevel models.
See the lmmlasso and glmmLasso packages for regularized mixed effects models.
See OpenMx and xxM packages (http://xxm.times.uh.edu/) for estimation of multilevel structural equation models.
Examples
## Not run:############################################################################## EXAMPLE 1: Two-level random intercept model##############################################################################--------------------------------------------------------------# Simulate data#--------------------------------------------------------------set.seed(976)G <-150; rg <- c(10,20)# 150 groups with group sizes ranging from 10 to 20#* simulate group sizesng <- round( stats::runif( G, min=rg[1], max=rg[2]))idcluster <- rep(1:G, ng )#* simulate covariateiccx <-.3x <- rep( stats::rnorm( G, sd=sqrt( iccx)), ng )+ stats::rnorm( sum(ng), sd=sqrt(1- iccx))#* simulate outcomeb0 <-1.5; b1 <-.4; iccy <-.2y <- b0 + b1*x + rep( stats::rnorm( G, sd=sqrt( iccy)), ng )+ stats::rnorm( sum(ng), sd=sqrt(1- iccy))#-----------------------#--- arrange input for mlnormal functionid <- idcluster # cluster is identifierX <- cbind(1, x )# matrix of covariatesN <- length(id)# number of units (clusters), which is GMD <- max(ng)# maximum number of persons in a groupNP <-2# number of covariance parameters theta#* list of design matrix for covariance matrix# In the case of the random intercept model, the covariance structure is# tau^2 * J + sigma^2 * I, where J is a matrix of ones and I is the# identity matrixZ <- as.list(1:G)for(gg in1:G){ Ngg <- ng[gg] Z[[gg]]<- as.list(1:2) Z[[gg]][[1]]<- matrix(1, nrow=Ngg, ncol=Ngg )# level 2 variance Z[[gg]][[2]]<- diag(1,Ngg)# level 1 variance}Z_list <- Z
#* parameter list containing the powers of parametersZ_index <- array(0, dim=c(G,2,2))Z_index[1:G,1,1]<- Z_index[1:G,2,2]<-1#** starting values and parameter namesbeta <- c(1,0)names(beta)<- c("int","x")theta <- c(.05,1)names(theta)<- c("tau2","sig2")#** create dataset for lme4 for comparisondat <- data.frame(y=y, x=x, id=id )#--------------------------------------------------------------# Model 1: Maximum likelihood estimation#--------------------------------------------------------------#** mlnormal functionmod1a <- LAM::mlnormal( y=y, X=X, id=id, Z_list=Z_list, Z_index=Z_index, beta=beta, theta=theta, method="ML")summary(mod1a)# lme4::lmer functionlibrary(lme4)mod1b <- lme4::lmer( y ~ x +(1| id ), data=dat, REML=FALSE)summary(mod1b)#--------------------------------------------------------------# Model 2: Restricted maximum likelihood estimation#--------------------------------------------------------------#** mlnormal functionmod2a <- LAM::mlnormal( y=y, X=X, id=id, Z_list=Z_list, Z_index=Z_index, beta=beta, theta=theta, method="REML")summary(mod2a)# lme4::lmer functionmod2b <- lme4::lmer( y ~ x +(1| id ), data=dat, REML=TRUE)summary(mod2b)#--------------------------------------------------------------# Model 3: Estimation of standard deviation instead of variances#--------------------------------------------------------------# The model is now parametrized in standard deviations# Variances are then modeled as tau^2 and sigma^2, respectively.Z_index2 <-2*Z_index # change loading matrix# estimate modelmod3 <- LAM::mlnormal( y=y, X=X, id=id, Z_list=Z_list, Z_index=Z_index2, beta=beta, theta=theta )summary(mod3)#--------------------------------------------------------------# Model 4: Maximum posterior estimation#--------------------------------------------------------------# specify prior distributions for parametersprior <- "
tau2 ~ dgamma(NA,2,.5) sig2 ~ dinvgamma(NA,.1,.1) x ~ dnorm(NA,.2,1000) "
# estimate model in mlnormalmod4 <- LAM::mlnormal( y=y, X=X, id=id, Z_list=Z_list, Z_index=Z_index, beta=beta, theta=theta, method="REML", prior=prior, vcov=FALSE)summary(mod4)#--------------------------------------------------------------# Model 5: Estimation with regularization on beta and theta parameters#--------------------------------------------------------------#*** penalty on theta parameterlambda_theta <-10weights_theta <-1+0* theta
#*** penalty on beta parameterlambda_beta <-3weights_beta <- c(0,1.8)# estimate modelmod5 <- LAM::mlnormal( y=y, X=X, id=id, Z_list=Z_list, Z_index=Z_index, beta=beta, theta=theta, method="ML", maxit=maxit, lambda_theta=lambda_theta, weights_theta=weights_theta, lambda_beta=lambda_beta, weights_beta=weights_beta )summary(mod5)############################################################################## EXAMPLE 2: Latent covariate model, two-level regression############################################################################## Yb=beta_0 + beta_b*Xb + eb (between level) and# Yw=beta_w*Xw + ew (within level)#--------------------------------------------------------------# Simulate data from latent covariate model#--------------------------------------------------------------set.seed(865)# regression parametersbeta_0 <-1; beta_b <-.7; beta_w <-.3G <-200# number of groupsn <-15# group sizeiccx <-.2# intra class correlation xiccy <-.35# (conditional) intra class correlation y# simulate latent variablesxb <- stats::rnorm(G, sd=sqrt( iccx ))yb <- beta_0 + beta_b * xb + stats::rnorm(G, sd=sqrt( iccy ))xw <- stats::rnorm(G*n, sd=sqrt(1-iccx ))yw <- beta_w * xw + stats::rnorm(G*n, sd=sqrt(1-iccy ))group <- rep(1:G, each=n )x <- xw + xb[ group ]y <- yw + yb[ group ]# test results on true datalm( yb ~ xb )lm( yw ~ xw )# create vector of outcomes in the form# ( y_11, x_11, y_21, x_21, ... )dat <- cbind( y, x )dat
Y <- as.vector( t(dat))# outcome vectorny <- length(Y)X <- matrix(0, nrow=ny, ncol=2)X[ seq(1,ny,2),1]<-1# design vector for mean yX[ seq(2,ny,2),2]<-1# design vector for mean xid <- rep( group, each=2)#--------------------------------------------------------------# Model 1: Linear regression ignoring multilevel structure#--------------------------------------------------------------# y=beta_0 + beta_t *x + e# Var(y)=beta_t^2 * var_x + var_e# Cov(y,x)=beta_t * var_x# Var(x)=var_x#** initial parameter valuestheta <- c(0,1,.5)names(theta)<- c("beta_t","var_x","var_e")beta <- c(0,0)names(beta)<- c("mu_y","mu_x")# The unit i is a cluster in this example.#--- define design matrices | list Z_listHlist <- list( matrix( c(1,0,0,0),2,2),# var(y) matrix( c(1,0,0,0),2,2),# var(y) (two terms) matrix( c(0,1,1,0),2,2),# cov(x,y) matrix( c(0,0,0,1),2,2))# var(x)U0 <- matrix(0, nrow=2*n,ncol=2*n )Ulist <- list( U0, U0, U0, U0 )M <- length(Hlist)for(mm in1:M){# mm <- 1for(nn in1:n){# nn <- 0 Ulist[[ mm ]][2*(nn-1)+1:2,2*(nn-1)+1:2]<- Hlist[[ mm ]]}}Z_list <- as.list(1:G)for(gg in1:G){ Z_list[[gg]]<- Ulist
}#--- define index vectorsZ_index <- array(0, dim=c(G,4,3))K0 <- matrix(0, nrow=4, ncol=3)colnames(K0)<- names(theta)# Var(y)=beta_t^2 * var_x + var_e (matrices withn indices 1 and 2)K0[1, c("beta_t","var_x")]<- c(2,1)# beta_t^2 * var_xK0[2, c("var_e")]<- c(1)# var_e# Cov(y,x)=beta_t * var_xK0[3, c("beta_t","var_x")]<- c(1,1)# Var(x)=var_xK0[4, c("var_x")]<- c(1)for(gg in1:G){ Z_index[gg,,]<- K0
}#*** estimate model with mlnormalmod1a <- LAM::mlnormal( y=Y, X=X, id=id, Z_list=Z_list, Z_index=Z_index, beta=beta, theta=theta, method="REML", vcov=FALSE)summary(mod1a)#*** estimate linear regression with stats::lmmod1b <- stats::lm( y ~ x )summary(mod1b)#--------------------------------------------------------------# Model 2: Latent covariate model#--------------------------------------------------------------#** initial parameterstheta <- c(0.12,.6,.5,0,.2,.2)names(theta)<- c("beta_w","var_xw","var_ew","beta_b","var_xb","var_eb")#--- define design matrices | list Z_listHlist <- list( matrix( c(1,0,0,0),2,2),# var(y) matrix( c(1,0,0,0),2,2),# var(y) (two terms) matrix( c(0,1,1,0),2,2),# cov(x,y) matrix( c(0,0,0,1),2,2))# var(x)U0 <- matrix(0, nrow=2*n,ncol=2*n )Ulist <- list( U0, U0, U0, U0,# within structure U0, U0, U0, U0 )# between structureM <- length(Hlist)#*** within structuredesign_within <- diag(n)# design matrix within structurefor(mm in1:M){# mm <- 1 Ulist[[ mm ]]<- base::kronecker( design_within, Hlist[[mm]])}#*** between structuredesign_between <- matrix(1, nrow=n, ncol=n)# matrix of ones corresponding to group sizefor(mm in1:M){# mm <- 1 Ulist[[ mm + M ]]<- base::kronecker( design_between, Hlist[[ mm ]])}Z_list <- as.list(1:G)for(gg in1:G){ Z_list[[gg]]<- Ulist
}#--- define index vectors Z_indexZ_index <- array(0, dim=c(G,8,6))K0 <- matrix(0, nrow=8, ncol=6)colnames(K0)<- names(theta)# Var(y)=beta^2 * var_x + var_e (matrices withn indices 1 and 2)K0[1, c("beta_w","var_xw")]<- c(2,1)# beta_t^2 * var_xK0[2, c("var_ew")]<- c(1)# var_eK0[5, c("beta_b","var_xb")]<- c(2,1)# beta_t^2 * var_xK0[6, c("var_eb")]<- c(1)# var_e# Cov(y,x)=beta * var_xK0[3, c("beta_w","var_xw")]<- c(1,1)K0[7, c("beta_b","var_xb")]<- c(1,1)# Var(x)=var_xK0[4, c("var_xw")]<- c(1)K0[8, c("var_xb")]<- c(1)for(gg in1:G){ Z_index[gg,,]<- K0
}#--- estimate model with mlnormalmod2a <- LAM::mlnormal( y=Y, X=X, id=id, Z_list=Z_list, Z_index=Z_index, beta=beta, theta=theta, method="ML")summary(mod2a)############################################################################## EXAMPLE 3: Simple linear regression, single level##############################################################################--------------------------------------------------------------# Simulate data#--------------------------------------------------------------set.seed(875)N <-300x <- stats::rnorm( N, sd=1.3)y <-.4+.7* x + stats::rnorm( N, sd=.5)dat <- data.frame( x, y )#--------------------------------------------------------------# Model 1: Linear regression modelled with residual covariance structure#--------------------------------------------------------------# matrix of predictrosX <- cbind(1, x )# list with design matricesZ <- as.list(1:N)for(nn in1:N){ Z[[nn]]<- as.list(1) Z[[nn]][[1]]<- matrix(1, nrow=1, ncol=1)# residual variance}#* loading matrixZ_index <- array(0, dim=c(N,1,1))Z_index[1:N,1,1]<-2# parametrize residual standard deviation#** starting values and parameter namesbeta <- c(0,0)names(beta)<- c("int","x")theta <- c(1)names(theta)<- c("sig2")# id vectorid <-1:N
#** mlnormal functionmod1a <- LAM::mlnormal( y=y, X=X, id=id, Z_list=Z, Z_index=Z_index, beta=beta, theta=theta, method="ML")summary(mod1a)# estimate linear regression with stats::lmmod1b <- stats::lm( y ~ x )summary(mod1b)#--------------------------------------------------------------# Model 2: Linear regression modelled with bivariate covariance structure#--------------------------------------------------------------#** define design matrix referring to mean structureX <- matrix(0, nrow=2*N, ncol=2)X[ seq(1,2*N,2),1]<- X[ seq(2,2*N,2),2]<-1#** create outcome vectory1 <- dat[ cbind( rep(1:N, each=2), rep(1:2, N ))]#** list with design matricesZ <- as.list(1:N)Z0 <-0*matrix(0, nrow=2,ncol=2)ZXY <- ZY <- ZX <- Z0
# design matrix Var(X)ZX[1,1]<-1# design matrix Var(Y)ZY[2,2]<-1# design matrix covarianceZXY[1,2]<- ZXY[2,1]<-1# Var(X)=sigx^2# Cov(X,Y)=beta * sigx^2# Var(Y)=beta^2 * sigx^2 + sige^2Z_list0 <- list( ZY, ZY, ZXY, ZX )for(nn in1:N){ Z[[nn]]<- Z_list0
}#* parameter list containing the powers of parameterstheta <- c(1,0.3,1)names(theta)<- c("sigx","beta","sige")Z_index <- array(0, dim=c(N,4,3))for(nn in1:N){# Var(X) Z_index[nn,4,]<- c(2,0,0)# Cov(X,Y) Z_index[nn,3,]<- c(2,1,0)# Var(Y) Z_index[nn,1,]<- c(2,2,0) Z_index[nn,2,]<- c(0,0,2)}#** starting values and parameter namesbeta <- c(0,0)names(beta)<- c("Mx","My")# id vectorid <- rep(1:N, each=2)#** mlnormal functionmod2a <- LAM::mlnormal( y=y1, X=X, id=id, Z_list=Z, Z_index=Z_index, beta=beta, theta=theta, method="ML")summary(mod2a)#--------------------------------------------------------------# Model 3: Bivariate normal distribution in (sigma_X, sigma_Y, sigma_XY) parameters#--------------------------------------------------------------# list with design matricesZ <- as.list(1:N)Z0 <-0*matrix(0, nrow=2,ncol=2)ZXY <- ZY <- ZX <- Z0
# design matrix Var(X)ZX[1,1]<-1# design matrix Var(Y)ZY[2,2]<-1# design matrix covarianceZXY[1,2]<- ZXY[2,1]<-1Z_list0 <- list( ZX, ZY, ZXY )for(nn in1:N){ Z[[nn]]<- Z_list0
}#* parameter listtheta <- c(1,1,.3)names(theta)<- c("sigx","sigy","sigxy")Z_index <- array(0, dim=c(N,3,3))for(nn in1:N){# Var(X) Z_index[nn,1,]<- c(2,0,0)# Var(Y) Z_index[nn,2,]<- c(0,2,0)# Cov(X,Y) Z_index[nn,3,]<- c(0,0,1)}#** starting values and parameter namesbeta <- c(0,0)names(beta)<- c("Mx","My")#** mlnormal functionmod3a <- LAM::mlnormal( y=y1, X=X, id=id, Z_list=Z, Z_index=Z_index, beta=beta, theta=theta, method="ML")summary(mod3a)#--------------------------------------------------------------# Model 4: Bivariate normal distribution in parameters of Cholesky decomposition#--------------------------------------------------------------# list with design matricesZ <- as.list(1:N)Z0 <-0*matrix(0, nrow=2,ncol=2)ZXY <- ZY <- ZX <- Z0
# design matrix Var(X)ZX[1,1]<-1# design matrix Var(Y)ZY[2,2]<-1# design matrix covarianceZXY[1,2]<- ZXY[2,1]<-1Z_list0 <- list( ZX, ZXY, ZY, ZY )for(nn in1:N){ Z[[nn]]<- Z_list0
}#* parameter list containing the powers of parameterstheta <- c(1,0.3,1)names(theta)<- c("L11","L21","L22")Z_index <- array(0, dim=c(N,4,3))for(nn in1:N){ Z_index[nn,1,]<- c(2,0,0) Z_index[nn,2,]<- c(1,1,0) Z_index[nn,3,]<- c(0,2,0) Z_index[nn,4,]<- c(0,0,2)}#** starting values and parameter namesbeta <- c(0,0)names(beta)<- c("Mx","My")# id vectorid <- rep(1:N, each=2)#** mlnormal functionmod4a <- LAM::mlnormal( y=y1, X=X, id=id, Z_list=Z, Z_index=Z_index, beta=beta, theta=theta, method="ML")# parameter with lower diagonal entries of Cholesky matrixmod4a$theta
# fill-in parameters for Cholesky matrixL <- matrix(0,2,2)L[! upper.tri(L)]<- mod4a$theta
#** reconstruct covariance matrixL
stats::cov.wt(dat, method="ML")$cov
## End(Not run)