dmnorm function

Density of (conditional) multivariate normal distribution

Density of (conditional) multivariate normal distribution

This function calculates and differentiates density of (conditional) multivariate normal distribution.

dmnorm( x, mean, sigma, given_ind = numeric(), log = FALSE, grad_x = FALSE, grad_sigma = FALSE, is_validation = TRUE, control = NULL, n_cores = 1L )

Arguments

  • x: numeric vector representing the point at which density should be calculated. If x is a matrix then each row determines a new point.
  • mean: numeric vector representing expectation of multivariate normal vector (distribution).
  • sigma: positively defined numeric matrix representing covariance matrix of multivariate normal vector (distribution).
  • given_ind: numeric vector representing indexes of multivariate normal vector which are conditioned at values of x with corresponding indexes i.e. x[given_x] or x[, given_x] if x is a matrix.
  • log: logical; if TRUE then probabilities (or densities) p are given as log(p) and derivatives will be given respect to log(p).
  • grad_x: logical; if TRUE then the vector of partial derivatives of the density function will be calculated respect to each element of x. If x is a matrix then gradients will be estimated for each row of x.
  • grad_sigma: logical; if TRUE then the vector of partial derivatives (gradient) of the density function will be calculated respect to each element of sigma. If x is a matrix then gradients will be estimated for each row of x.
  • is_validation: logical value indicating whether input arguments should be validated. Set it to FALSE to get performance boost (default value is TRUE).
  • control: a list of control parameters. See Details.
  • n_cores: positive integer representing the number of CPU cores used for parallel computing. Currently it is not recommended to set n_cores > 1 if vectorized arguments include less then 100000 elements.

Returns

This function returns an object of class "mnorm_dmnorm".

An object of class "mnorm_dmnorm" is a list containing the following components:

  • den - density function value at x.

  • grad_x - gradient of density respect to x if grad_x or grad_sigma input argument is set to TRUE.

  • grad_sigma - gradient respect to the elements of sigma

    if grad_sigma input argument is set to TRUE.

If log is TRUE then den is a log-density so output grad_x and grad_sigma are calculated respect to the log-density.

Output grad_x is a Jacobian matrix which rows are gradients of the density function calculated for each row of x. Therefore grad_x[i, j] is a derivative of the density function respect to the j-th argument at point x[i, ].

Output grad_sigma is a 3D array such that grad_sigma[i, j, k]

is a partial derivative of the density function respect to the sigma[i, j] estimated for the observation x[k, ].

Details

Consider notations from the Details section of cmnorm. The function calculates density f(x(d)x(g))f(x^{(d)}|x^{(g)}) of conditioned multivariate normal vector XIdXIg=x(g)X_{I_{d}} | X_{I_{g}} = x^{(g)}. Where x(d)x^{(d)} is a subvector of xx associated with XIdX_{I_{d}} i.e. unconditioned components. Therefore x[given_ind] represents x(g)x^{(g)} while x[-given_ind] is x(d)x^{(d)}.

If grad_x is TRUE then function additionally estimates the gradient respect to both unconditioned and conditioned components:

f(x(d)x(g))=(f(x(d)x(g))x1,...,f(x(d)x(g))xm), \nabla f(x^{(d)}|x^{(g)})=\left(\frac{\partial f(x^{(d)}|x^{(g)})}{\partial x_{1}},...,\frac{\partial f(x^{(d)}|x^{(g)})}{\partial x_{m}}\right),

where each xix_{i} belongs either to x(d)x^{(d)} or x(g)x^{(g)}

depending on whether iIdi\in I_{d} or iIgi\in I_{g} correspondingly. In particular subgradients of density function respect to x(d)x^{(d)}

and x(g)x^{(g)} are of the form:

x(d)lnf(x(d)x(g))=(x(d)μc)Σc1 \nabla_{x^{(d)}}\ln f(x^{(d)}|x^{(g)}) =-\left(x^{(d)}-\mu_{c}\right)\Sigma_{c}^{-1} x(g)lnf(x(d)x(g))=x(d)f(x(d)x(g))Σd,gΣg,g1 \nabla_{x^{(g)}}\ln f(x^{(d)}|x^{(g)}) =-\nabla_{x^{(d)}}f(x^{(d)}|x^{(g)})\Sigma_{d,g}\Sigma_{g,g}^{-1}

If grad_sigma is TRUE then function additionally estimates the gradient respect to the elements of covariance matrix Σ\Sigma. For iIdi\in I_{d} and jIdj\in I_{d} the function calculates:

lnf(x(d)x(g))Σi,j=(lnf(x(d)x(g))xi×lnf(x(d)x(g))xjΣc,(i,j)1)/(1+I(i=j)), \frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial \Sigma_{i, j}} =\left(\frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial x_{i}} \times\frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial x_{j}} -\Sigma_{c,(i, j)}^{-1}\right) /\left(1 + I(i=j)\right),

where I(i=j)I(i=j) is an indicator function which equals 11 when the condition i=ji=j is satisfied and 00 otherwise.

For iIdi\in I_{d} and jIgj\in I_{g} the following formula is used:

lnf(x(d)x(g))Σi,j=lnf(x(d)x(g))xi×((x(g)μg)Σg,g1)qg(j) \frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial \Sigma_{i, j}} =-\frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial x_{i}} \times\left(\left(x^{(g)}-\mu_{g}\right)\Sigma_{g,g}^{-1}\right)_{q_{g}(j)}- k=1nd(1+I(qd(i)=k))×(Σd,gΣg,g1)k,qg(j)×lnf(x(d)x(g))Σi,qd1(k), -\sum\limits_{k=1}^{n_{d}}(1+I(q_{d}(i)=k))\times(\Sigma_{d,g}\Sigma_{g,g}^{-1})_{k,q_{g}(j)}\times\frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial \Sigma_{i, q^{-1}_{d}(k)}},

where qg(j)=k=1mI(Ig,kj)q_{g}(j)=\sum\limits_{k=1}^{m} I\left(I_{g,k} \leq j\right)

and qd(i)=k=1mI(Id,ki)q_{d}(i)=\sum\limits_{k=1}^{m} I\left(I_{d,k} \leq i\right)

represent the order of the ii-th and jj-th elements in IgI_{g} and IdI_{d} correspondingly i.e. xi=xqd(i)(d)=xId,qd(i)x_{i}=x^{(d)}_{q_{d}(i)}=x_{I_{d, q_{d}(i)}} and xj=xqg(j)(g)=xIg,qg(j)x_{j}=x^{(g)}_{q_{g}(j)}=x_{I_{g, q_{g}(j)}}. Note that qg(j)1q_{g}(j)^{-1} and qd(i)1q_{d}(i)^{-1} are inverse functions. Number of conditioned and unconditioned components are denoted by ng=k=1mI(kIg)n_{g}=\sum\limits_{k=1}^{m}I(k\in I_{g}) and nd=k=1mI(kId)n_{d}=\sum\limits_{k=1}^{m}I(k\in I_{d}) respectively. For the case iIgi\in I_{g} and jIdj\in I_{d} the formula is similar.

For iIgi\in I_{g} and jIgj\in I_{g} the following formula is used:

lnf(x(d)x(g))Σi,j=x(d)lnf(x(d)x(g))×(x(g)×(Σd,g×Σg,g1×Ig×Σg,g1)T)T \frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial \Sigma_{i, j}} =-\nabla_{x^{(d)}}\ln f(x^{(d)}|x^{(g)})\times\left(x^{(g)}\times(\Sigma_{d,g} \times \Sigma_{g,g}^{-1} \times I_{g}^{*} \times\Sigma_{g,g}^{-1})^{T}\right)^T - k1=1ndk2=k1ndlnf(x(d)x(g))Σqd(k1)1,qd(k2)1(Σd,g×Σg,g1×Ig×Σg,g1×Σd,gT)qd(k1)1,qd(k2)1, -\sum\limits_{k_{1}=1}^{n_{d}}\sum\limits_{k_{2}=k_{1}}^{n_{d}}\frac{\partial \ln f(x^{(d)}|x^{(g)})}{\partial \Sigma_{q_{d}(k_{1})^{-1}, q_{d}(k_{2})^{-1}}}\left(\Sigma_{d,g} \times \Sigma_{g,g}^{-1} \times I_{g}^{*} \times\Sigma_{g,g}^{-1}\times\Sigma_{d,g}^T\right)_{q_{d}(k_{1})^{-1},q_{d}(k_{2})^{-1}},

where IgI_{g}^{*} is a square ngn_{g}-dimensional matrix of zeros except Ig,(i,j)=Ig,(j,i)=1I_{g,(i,j)}^{*}=I_{g,(j,i)}^{*}=1.

Argument given_ind represents IgI_{g} and it should not contain any duplicates. The order of given_ind elements does not matter so it has no impact on the output.

More details on abovementioned differentiation formulas could be found in the appendix of E. Kossova and B. Potanin (2018).

Currently control has no input arguments intended for the users. This argument is used for some internal purposes of the package.

Examples

# Consider multivariate normal vector: # X = (X1, X2, X3, X4, X5) ~ N(mean, sigma) # Prepare multivariate normal vector parameters # expected value mean <- c(-2, -1, 0, 1, 2) n_dim <- length(mean) # correlation matrix cor <- c( 1, 0.1, 0.2, 0.3, 0.4, 0.1, 1, -0.1, -0.2, -0.3, 0.2, -0.1, 1, 0.3, 0.2, 0.3, -0.2, 0.3, 1, -0.05, 0.4, -0.3, 0.2, -0.05, 1) cor <- matrix(cor, ncol = n_dim, nrow = n_dim, byrow = TRUE) # covariance matrix sd_mat <- diag(c(1, 1.5, 2, 2.5, 3)) sigma <- sd_mat %*% cor %*% t(sd_mat) # Estimate the density of X at point (-1, 0, 1, 2, 3) x <- c(-1, 0, 1, 2, 3) d.list <- dmnorm(x = x, mean = mean, sigma = sigma) d <- d.list$den print(d) # Estimate the density of X at points # x=(-1, 0, 1, 2, 3) and y=(-1.2, -1.5, 0, 1.2, 1.5) y <- c(-1.5, -1.2, 0, 1.2, 1.5) xy <- rbind(x, y) d.list.1 <- dmnorm(x = xy, mean = mean, sigma = sigma) d.1 <- d.list.1$den print(d.1) # Estimate the density of Xc=(X2, X4, X5 | X1 = -1, X3 = 1) at # point xd=(0, 2, 3) given conditioning values xg=(-1, 1) given_ind <- c(1, 3) d.list.2 <- dmnorm(x = x, mean = mean, sigma = sigma, given_ind = given_ind) d.2 <- d.list.2$den print(d.2) # Estimate the gradient of density respect to the argument and # covariance matrix at points 'x' and 'y' d.list.3 <- dmnorm(x = xy, mean = mean, sigma = sigma, grad_x = TRUE, grad_sigma = TRUE) # Gradient respect to the argument grad_x.3 <- d.list.3$grad_x # at point 'x' print(grad_x.3[1, ]) # at point 'y' print(grad_x.3[2, ]) # Partial derivative at point 'y' respect # to the 3-rd argument print(grad_x.3[2, 3]) # Gradient respect to the covariance matrix grad_sigma.3 <- d.list.3$grad_sigma # Partial derivative respect to sigma(3, 5) at # point 'y' print(grad_sigma.3[3, 5, 2]) # Estimate the gradient of the log-density function of # Xc=(X2, X4, X5 | X1 = -1, X3 = 1) and Yc=(X2, X4, X5 | X1 = -1.5, X3 = 0) # respect to the argument and covariance matrix at # points xd=(0, 2, 3) and yd=(-1.2, 0, 1.5) respectively given # conditioning values xg=(-1, 1) and yg=(-1.5, 0) correspondingly d.list.4 <- dmnorm(x = xy, mean = mean, sigma = sigma, grad_x = TRUE, grad_sigma = TRUE, given_ind = given_ind, log = TRUE) # Gradient respect to the argument grad_x.4 <- d.list.4$grad_x # at point 'xd' print(grad_x.4[1, ]) # at point 'yd' print(grad_x.4[2, ]) # Partial derivative at point 'xd' respect to 'xg[2]' print(grad_x.4[1, 3]) # Partial derivative at point 'yd' respect to 'yd[5]' print(grad_x.4[2, 5]) # Gradient respect to the covariance matrix grad_sigma.4 <- d.list.4$grad_sigma # Partial derivative respect to sigma(3, 5) at # point 'yd' print(grad_sigma.4[3, 5, 2]) # Compare analytical gradients from the previous example with # their numeric (forward difference) analogues at point 'xd' # given conditioning 'xg' delta <- 1e-6 grad_x.num <- rep(NA, 5) grad_sigma.num <- matrix(NA, nrow = 5, ncol = 5) for (i in 1:5) { x.delta <- x x.delta[i] <- x[i] + delta d.list.delta <- dmnorm(x = x.delta, mean = mean, sigma = sigma, grad_x = TRUE, grad_sigma = TRUE, given_ind = given_ind, log = TRUE) grad_x.num[i] <- (d.list.delta$den - d.list.4$den[1]) / delta for(j in 1:5) { sigma.delta <- sigma sigma.delta[i, j] <- sigma[i, j] + delta sigma.delta[j, i] <- sigma[j, i] + delta d.list.delta <- dmnorm(x = x, mean = mean, sigma = sigma.delta, grad_x = TRUE, grad_sigma = TRUE, given_ind = given_ind, log = TRUE) grad_sigma.num[i, j] <- (d.list.delta$den - d.list.4$den[1]) / delta } } # Comparison of gradients respect to the argument h.x <- cbind(analytical = grad_x.4[1, ], numeric = grad_x.num) rownames(h.x) <- c("xg[1]", "xd[1]", "xg[2]", "xd[3]", "xd[4]") print(h.x) # Comparison of gradients respect to the covariance matrix h.sigma <- list(analytical = grad_sigma.4[, , 1], numeric = grad_sigma.num) print(h.sigma)

References

E. Kossova., B. Potanin (2018). Heckman method and switching regression model multivariate generalization. Applied Econometrics, vol. 50, pages 114-143.

  • Maintainer: Bogdan Potanin
  • License: GPL (>= 2)
  • Last published: 2023-11-29

Useful links