mixregRM2() R function from [MixSemiRob]

Robust Mixture Regression with Thresholding-Embedded EM Algorithm for Penalized Estimation

A robust mixture regression model that simultaneously conducts outlier detection and robust parameter estimation. It uses a sparse, case-specific, and scale-dependent mean-shift mixture model parameterization (Yu et al., 2017): [REMOVE_ME] $f(y_i|\boldsymbol{x}_i,\boldsymbol{\theta},\boldsymbol{\gamma}_i) = \sum_{j=1}^C\pi_j\phi(y_i;\boldsymbol{x}^{\top}\boldsymbol{\beta}_j+\gamma_{ij}\sigma_j,\sigma_j^2), [REMOVE_ME_2]$

$i=1,\cdots,n$ , where $C$ is the number of components in the model, $\boldsymbol{\theta}=(\pi_1,\boldsymbol{\beta}_1,\sigma_1,..,\pi_{C},\boldsymbol{\beta}_C,\sigma_C)^{\top}$

is the parameter to estimate, and $\boldsymbol{\gamma}_i=(\gamma_{i1},...,\gamma_{iC})^{\top}$ is a vector of mean-shift parameter for the ith observation.


mixregRM2(x, y, C = 2, ini = NULL, nstart = 20, tol = 1e-02, maxiter = 50,
          method = c("HARD", "SOFT"), sigma.const = 0.001, lambda = 0.001)

Arguments

x: an n by p data matrix where n is the number of observations and p is the number of explanatory variables. The intercept term will automatically be added to the data.
y: an n-dimensional vector of response variable.
C: number of mixture components. Default is 2.
ini: initial values for the parameters. Default is NULL, which obtains the initial values using the mixreg function. It can be a list with the form of list(pi, beta, sigma, gamma), where pi is a vector of C mixing proportions, beta is a C by (p + 1) matrix for regression coefficients of C components, sigma is a vector of C standard deviations, and gamma is a vector of C mean shift values.
nstart: number of initializations to try. Default is 20.
tol: stopping criteria (threshold value) for the EM algorithm. Default is 1e-02.
maxiter: maximum number of iterations for the EM algorithm. Default is 50.
method: character, determining which threshold method to use: HARD or SOFT. Default is HARD. See details.
sigma.const: constraint on the ratio of minimum and maximum values of sigma. Default is 0.001.
lambda: tuning parameter in the penalty term. It can be found based on BIC. See Yu et al. (2017) for more details.

Returns

A list containing the following elements: - pi: C-dimensional vector of estimated mixing proportions.

beta: C by (p + 1) matrix of estimated regression coefficients.
sigma: C-dimensional vector of estimated standard deviations.
gamma: n-dimensional vector of estimated mean shift values.
posterior: n by C matrix of posterior probabilities of each observation belonging to each component.
run: total number of iterations after convergence.

Description

f(y_i|\boldsymbol{x}_i,\boldsymbol{\theta},\boldsymbol{\gamma}_i) = \sum_{j=1}^C\pi_j\phi(y_i;\boldsymbol{x}^{\top}\boldsymbol{\beta}_j+\gamma_{ij}\sigma_j,\sigma_j^2),

$i=1,\cdots,n$ , where $C$ is the number of components in the model, $\boldsymbol{\theta}=(\pi_1,\boldsymbol{\beta}_1,\sigma_1,..,\pi_{C},\boldsymbol{\beta}_C,\sigma_C)^{\top}$

is the parameter to estimate, and $\boldsymbol{\gamma}_i=(\gamma_{i1},...,\gamma_{iC})^{\top}$ is a vector of mean-shift parameter for the ith observation.

Details

The parameters are estimated by maximizing the corresponding penalized log-likelihood function using an EM algorithm. The thresholding rule involes the estimation of $\gamma_{ij}$ corresponding to different penalty:

Soft threshold: $\hat{\gamma}_{ij} = sgn(\epsilon_{ij})(|\epsilon_{ij}|-\lambda_{ij}^*)_{+})$ , corresponding to the $l_1$ penalty.
Hard threshold: $\hat{\gamma}_{ij} = \epsilon_{ij}I(|\epsilon_{ij}|\>\lambda_{ij}^*))$ , corresponding to the $l_0$ penalty.

Here, $\epsilon_{ij} = (y_i-\boldsymbol{x}_i^{\top}\boldsymbol{\beta_j})/\sigma_j$ and $(\cdot)_{+}=\max(\cdot,0)$ . Also, $\lambda_{ij}^*$ is taken as $\lambda/p_{ij}^{(k+1)}$ for soft threshold and $\lambda/\sqrt{p_{ij}^{(k+1)}}$ for hard threshold.

Examples


data(tone)
y = tone$tuned
x = tone$stretchratio
k = 160
x[151:k] = 0
y[151:k] = 5
est_RM2 = mixregRM2(x, y, lambda = 1)

References

Yu, C., Yao, W., and Chen, K. (2017). A new method for robust mixture regression. Canadian Journal of Statistics, 45(1), 77-94.

mixregRM2 function