kdeem function

Kernel Density-based EM-type algorithm for Semiparametric Mixture Regression with Unspecified Error Distributions

Kernel Density-based EM-type algorithm for Semiparametric Mixture Regression with Unspecified Error Distributions

`kdeem' is used for semiparametric mixture regression using a kernel density-based expectation-maximization (EM)-type algorithm with unspecified homogeneous or heterogenous error distributions (Ma et al., 2012).

kdeem(x, y, C = 2, ini = NULL, maxiter = 200)

Arguments

  • x: an n by p data matrix where n is the number of observations and p is the number of explanatory variables (including the intercept).
  • y: an n-dimensional vector of response variable.
  • C: number of mixture components. Default is 2.
  • ini: initial values for the parameters. Default is NULL, which obtains the initial values using the kdeem.lse function. If specified, it can be a list with the form of list(beta, prop, tau, pi, h), where beta is a p by C matrix for regression coefficients of C components, prop is an n by C matrix for probabilities of each observation belonging to each component, caculated based on the initial beta and h, tau is a vector of C precision parameters (inverse of standard deviation), pi is a vector of C mixing proportions, and h is the bandwidth for kernel estimation.
  • maxiter: maximum number of iterations for the algorithm. Default is 200.

Returns

A list containing the following elements: - posterior: posterior probabilities of each observation belonging to each component.

  • beta: estimated regression coefficients.

  • tau: estimated precision parameters, the inverse of standard deviation.

  • pi: estimated mixing proportions.

  • h: bandwidth used for the kernel estimation.

Details

It can be used for a semiparametric mixture of linear regression models with unspecified component error distributions. The errors can be either homogeneous or heterogenous. The model is as follows:

fYX(y,x,θ,g)=j=1Cπjτjg{(yxβj)τj}. f_{Y|\boldsymbol{X}}(y,\boldsymbol{x},\boldsymbol{\theta},g) = \sum_{j=1}^C\pi_j\tau_jg\{(y-\boldsymbol{x}^{\top}\boldsymbol{\beta}_j)\tau_j\}.

Here, θ=(π1,...,πC1,β1,..,βC,τ1,...,τC)\boldsymbol{\theta}=(\pi_1,...,\pi_{C-1},\boldsymbol{\beta}_1^{\top},..,\boldsymbol{\beta}_C^{\top},\tau_1,...,\tau_C)^{\top}, g()g(\cdot) is an unspecified density function with mean 0 and variance 1, and τj\tau_j is a precision parameter. For the calculation of β\beta in the M-step, this function employs the universal optimizer function ucminf from the `ucminf' package.

Examples

n = 300 C = 2 Dimen = 2 Beta.true.matrix = matrix(c(-3, 3, 3, -3), Dimen, C) PI.true = c(0.5, 0.5) x = runif(n) X = cbind(1, x) Group.ID = Rlab::rbern(n, prob = 0.5) Error = rnorm(n, 0, 1) n1 = sum(Group.ID) n2 = n - n1 y = rep(0, n) err = rep(0, n) for(i in 1:n){ if(Group.ID[i] == 1){ err[i] = Error[i] y[i] = X[i, ] %*% Beta.true.matrix[, 1] + err[i] } else { err[i] = 0.5 * Error[i] y[i] = X[i, ] %*% Beta.true.matrix[, 2] + err[i] } } Result.kdeem.lse = kdeem.lse(x, y) Result.kdeem.h = kdeem.h(x, y, 2, Result.kdeem.lse, maxiter = 200) Result.kdeem = kdeem(x, y, 2, Result.kdeem.lse, maxiter = 200)

References

Ma, Y., Wang, S., Xu, L., & Yao, W. (2021). Semiparametric mixture regression with unspecified error distributions. Test, 30, 429-444.

See Also

kdeem.h, kdeem.lse, and ucminf for beta calculation.

  • Maintainer: Suyeon Kang
  • License: GPL (>= 2)
  • Last published: 2023-09-20

Useful links