complh() R function from [MixSemiRob]

Complete Likelihood Frequency Method for Label Switching

`complh' is used to address the label switching problem by maximizing the complete likelihood (Yao, 2015). This method leverages information from the latent component label, which is the label the user used to generate the sample. The function supports both one-dimensional (with equal variances or unequal variances) and multi-dimensional data (with equal variances).


complh(est, lat)

Arguments

est: a list with four elements representing the estimated mixture model, which can be obtained using the mixnorm function. When specified, it has the form of list(mu, sigma, pi, p), where mu is a C by p matrix of estimated component means where p is the dimension of data and C is the number of mixture components, sigma is a p by p matrix of estimated common standard deviation for all components (when the data is multi-dimensional) or a C-dimensional vector of estimated component standard deviations (when the data is one-dimensional), pi is a C-dimensional vector of mixing proportions, and p is a C by n matrix of the classification probabilities, where the (i, j)th element corresponds to the probability of the jth observation belonging to the ith component.
lat: a C by n zero-one matrix representing the latent component labels for all observations, where C is the number of components in the mixture model and n is the number of observations. If the (i, j)th cell is 1, it indicates that the jth observation belongs to the ith component.

Returns

The estimation results adjusted to account for potential label switching problems are returned, as a list containing the following elements: - mu: C by p matrix of estimated component means.

sigma: C-dimensional vector of estimated component standard deviations (for univariate data) or p by p matrix of estimated component variance (for multivariate data).
pi: C-dimensional vector of estimated mixing proportions.

Examples


#-----------------------------------------------------------------------------------------#
# Example 1: Two-component Univariate Normal Mixture
#-----------------------------------------------------------------------------------------#
# Simulate the data
set.seed(827)
n = 200
prop = 0.3
n1 = rbinom(1, n, prop)
mudif = 1.5
x1 = rnorm(n1, 0, 1)
x2 = rnorm(n - n1, mudif, 1)
x = c(x1, x2)
pm = c(2, 1, 3, 5, 4)

# Use the `mixnorm' function to get the MLE and the estimated classification probabilities
out = mixnorm(x, 2)

# Prepare latent component label
lat = rbind(rep(c(1, 0), times = c(n1, n - n1)),
            rep(c(0, 1), times = c(n1, n - n1)))

# Fit the complh/distlat function
clhest = complh(out, lat)
clhest
# Result:
# mean of the first component: -0.1037359,
# mean of the second component: 1.6622397,
# sigma is 0.8137515 for both components, and
# the proportions for the two components are
# 0.3945660 and 0.6054340, respectively.
ditlatest = distlat(out, lat)

#-----------------------------------------------------------------------------------------#
# Example 2: Two-component Multivariate Normal Mixture
#-----------------------------------------------------------------------------------------#
# Simulate the data
n = 400
prop = 0.3
n1 = rbinom(1, n, prop)
pi = c(prop, 1 - prop)
mu1 = 0.5
mu2 = 0.5
mu = matrix(c(0, mu1, 0, mu2), ncol = 2)
pm = c(2, 1, 4, 3, 6, 5)
sigma = diag(c(1, 1))
ini = list(sigma = sigma, mu = mu, pi = pi)
x1 = mvtnorm::rmvnorm(n1, c(0, 0), ini$sigma)
x2 = mvtnorm::rmvnorm(n - n1, c(mu1, mu2), ini$sigma)
x = rbind(x1, x2)

# Use the `mixnorm' function to get the MLE and the estimated classification probabilities
out = mixnorm(x, 2)

# Prepare latent component label
lat = rbind(rep(c(1, 0), times = c(n1, n - n1)),
            rep(c(0, 1), times = c(n1, n - n1)))

# Fit the complh/distlat function
clhest = complh(out, lat)
distlatest = distlat(out, lat)

#-----------------------------------------------------------------------------------------#
# Example 3: Three-component Multivariate Normal Mixture
#-----------------------------------------------------------------------------------------#
# Simulate the data
n = 100
pi = c(0.2, 0.3, 0.5)
ns = stats::rmultinom(1, n, pi)
n1 = ns[1]; n2 = ns[2]; n3 = ns[3]
mu1 = 1
mu2 = 1
mu = matrix(c(0, mu1, 2 * mu1, 0, mu2, 2 * mu2), ncol = 2)
sigma = diag(c(1, 1))
ini = list(sigma = sigma, mu = mu, pi = pi)
x1 = mvtnorm::rmvnorm(n1, c(0, 0), ini$sigma)
x2 = mvtnorm::rmvnorm(n2, c(mu1, mu2), ini$sigma)
x3 = mvtnorm::rmvnorm(n3, c(2 * mu1, 2 * mu2), ini$sigma)
x = rbind(x1, x2, x3)

# Use the `mixnorm' function to get the MLE and the estimated classification probabilities
out = mixnorm(x, 3)

# Prepare latent component label
lat = rbind(rep(c(1, 0), times = c(n1, n - n1)),
            rep(c(0, 1, 0), times = c(n1, n2, n3)),
            rep(c(0, 1), times = c(n - n3, n3)))

# Fit the complh/distlat function
clhest = complh(out, lat)
distlatest = distlat(out, lat)

References

Yao, W. (2015). Label switching and its solutions for frequentist mixture models. Journal of Statistical Computation and Simulation, 85(5), 1000-1012.

complh function

Complete Likelihood Frequency Method for Label Switching

Arguments

Returns

Examples

References

See Also