Functionalities for fitting multivariate normal variance mixtures (in particular also Multivariate t distributions) via an ECME algorithm.
fitnvmix(x, qmix, mix.param.bounds, nu.init =NA, loc =NULL, scale =NULL, init.size.subsample = min(n,100), size.subsample = n, control = list(), verbose =TRUE)fitStudent(x, loc =NULL, scale =NULL, mix.param.bounds = c(1e-3,1e2),...)fitNorm(x)
Arguments
x: (n,d)-data matrix.
qmix: specification of the mixing variable W; see McNeil et al. (2015, Chapter 6). Supported are the following types of specification (see also the examples below):
character:: character string specifying a supported distribution; currently available are "constant" (in which case W=1 and thus the multivariate normal distribution with mean vector loc and covariance matrix scale results), "inverse.gamma" (in which case W is inverse gamma distributed with shape and rate parameters df/2 and thus the multivariate Student t distribution with df degrees of freedom results) and "pareto" (in which case W is Pareto distributed with scale equal to unity and shape equal to alpha).
function:: function
interpreted as the quantile function of the mixing variable $W$. In this case, `qmix` **must** have the form `qmix = function(u, nu)`, where the argument `nu`
corresponds to the parameter (vector) specifying the distribution of the mixing variable.
mix.param.bounds: either numeric(2) or a matrix with two columns. The first/second column corresponds to the lower/upper bound of nui, the ith component of the parameter vector nu of the mixing variable W. All elements need to be finite, numeric values. Note: The algorithm tends to converge quicker if the parameter ranges supplied are smaller.
nu.init: either NA or an initial guess for the parameter (vector) nu. In the former case an initial estimate is calculated by the algorithm. If nu.init is provided, the algorithm often converges faster; the better the starting value, the faster convergence.
loc: d-vector; if provided, taken as the 'true' location vector in which case loc is not estimated.
scale: positive definite (d,d)-matrix; if provided, taken as the 'true' scale matrix in which case scale is not estimated.
init.size.subsample: numeric, non-negative, giving the sub-samplesize used to get an initial estimate for nu. Only used if is.na(nu.init), otherwise ignored.
size.subsample: numeric, non-negative, specifying the size of the subsample that is being used in the ECME iterations to optimize the log-likelihood over nu. Defaults to n, so that the full sample is being used. Decreasing this number can lead to faster run-times (as fewer log-densities need to be estimated) but also to an increase in bias and variance.
control: list specifying algorithm specific parameters; see below under 'Details' and get_set_param().
verbose: numeric or logical (in which case it is converted to numeric) specifying the amount of tracing to be done. If 0 or FALSE, neither tracing nor warnigns are communicated; if 1, only warnigns are communicated, if 2 or 3, warnings and (shorter or longer) tracing information is provided.
...: additional arguments passed to the underlying fitnvmix().
Returns
The function fitnvmix() returns an S3 object of class"fitnvmix", basically a list
which contains, among others, the components
nu: Estimated mixing parameter (vector) nu.
loc: Estimated or provided loc vector.
scale: Estimated or provided scale matrix.
max.ll: Estimated log-likelihood at reported estimates.
x: Input data matrix x.
The methods print(), summary() and plot() are defined for the class "fitnvmix".
fitStudent() is a wrapper to fitnvmix() for parameter estimation of multivariate Student t distributions; it also returns an S3 object of class"fitnvmix" where the fitted degrees of freedom are called "df" instead of "nu" (to be consistent with the other wrappers for the Student t distributions).
fitNorm() just returns a list with components loc (columnwise sample means) and scale (sample covariance matrix).
Details
The function fitnvmix() uses an ECME algorithm to approximate the MLEs of the parameters nu, loc and scale of a normal variance mixture specified by qmix. The underlying procedure successively estimates nu (with given loc and scale) by maximizing the likelihood which is approximated by dnvmix() (unless qmix is a character
string, in which case analytical formulas for the log-densities are used) and scale and loc (given nu) using weights (which again need to be approximated) related to the posterior distribution, details can be found in the first reference below.
It should be highlighted that (unless unless qmix is a character string), every log-likelihood and every weight needed in the estimation is numerically approximated via RQMC methods. For large dimensions and sample sizes this procedure can therefore be slow.
Various tolerances and convergence criteria can be changed by the user via the control argument. For more details, see get_set_param().
Author(s)
Erik Hintz, Marius Hofert and Christiane Lemieux
References
Hintz, E., Hofert, M. and Lemieux, C. (2021), Normal variance mixtures: Distribution, density and parameter estimation. Computational Statistics and Data Analysis 157C, 107175.
Hintz, E., Hofert, M. and Lemieux, C. (2022), Multivariate Normal Variance Mixtures in : The Package nvmix. Journal of Statistical Software, tools:::Rd_expr_doi("10.18637/jss.v102.i02") .
McNeil, A. J., Frey, R. and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press.
Liu, C. and Rubin, D. (1994). The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633--648.
## Sampling parametersset.seed(274)# for reproducibilitynu <-2.8# parameter used to sample datad <-4# dimensionn <-75# small sample size to have examples run fastloc <- rep(0, d)# location vectorA <- matrix(runif(d * d), ncol = d)diag_vars <- diag(runif(d, min =2, max =5))scale <- diag_vars %*% cov2cor(A %*% t(A))%*% diag_vars # scale matrixmix.param.bounds <- c(1,5)# nu in [1, 5]### Example 1: Fitting a multivariate t distribution ###########################if(FALSE){## Define 'qmix' as the quantile function of an IG(nu/2, nu/2) distribution qmix <-function(u, nu)1/ qgamma(1- u, shape = nu/2, rate = nu/2)## Sample data using 'rnvmix' x <- rnvmix(n, qmix = qmix, nu = nu, loc = loc, scale = scale)## Call 'fitvnmix' with 'qmix' as a function (so all densities/weights are estimated)(MyFit11 <- fitnvmix(x, qmix = qmix, mix.param.bounds = mix.param.bounds))## Call 'fitnvmix' with 'qmix = "inverse.gamma"' in which case analytical formulas## for weights and densities are used:(MyFit12 <- fitnvmix(x, qmix ="inverse.gamma", mix.param.bounds = mix.param.bounds))## Alternatively, use the wrapper 'fitStudent()'(MyFit13 <- fitStudent(x))## Check stopifnot(all.equal(MyFit11$nu, MyFit12$nu, tol =5e-2), all.equal(MyFit11$nu, MyFit13$nu, tol =5e-2))## Can also provide 'loc' and 'scale' in which case only 'nu' is estimated(MyFit13 <- fitnvmix(x, qmix ="inverse.gamma", mix.param.bounds = mix.param.bounds, loc = loc, scale = scale))(MyFit14 <- fitStudent(x, loc = loc, scale = scale)) stopifnot(all.equal(MyFit13$nu, MyFit14$df, tol =1e-6))}### Example 2: Fitting a Pareto mixture ########################################## Define 'qmix' as the quantile function of a Par(nu, 1) distributionqmix <-function(u, nu)(1-u)^(-1/nu)## Sample data using 'rnvmix':x <- rnvmix(n, qmix = qmix, nu = nu, loc = loc, scale = scale)## Call 'fitvnmix' with 'qmix' as function (=> densities/weights estimated)(MyFit21 <- fitnvmix(x, qmix = qmix, mix.param.bounds = mix.param.bounds))## Call 'fitnvmix' with 'qmix = "pareto"' in which case an analytical formula## for the density is used(MyFit22 <- fitnvmix(x, qmix ="pareto", mix.param.bounds = mix.param.bounds))stopifnot(all.equal(MyFit21$nu, MyFit22$nu, tol =5e-2))