stepmix function

R interface to stepmix in StepMix python.

R interface to stepmix in StepMix python.

This function creates a basic R list that will be used to initialize the stepmix object in python, in order to use the fit and predict function.

stepmix(n_components = 2, n_steps = 1, measurement = "bernoulli", structural = "gaussian_unit", assignment = "modal", correction = NULL, abs_tol = 1e-10, rel_tol = 0, max_iter = 1000, n_init = 1, init_params = "random", random_state = NULL, verbose = 0, progress_bar = 1, measurement_params = NULL, structural_params = NULL)

Arguments

  • n_components: The number of latent class. 2 by default.

  • n_steps: 1, 2, or 3, 1 by default. Number of steps in the estimation. Must be one of : 1: run EM on both the measurement and structural models.

    2: first run EM on the measurement model, then on the complete model, but keep the measurement parameters fixed for the second step. See Bakk, 2018.

    3: first run EM on the measurement model, assign class probabilities, then fit the structural model via maximum likelihood. See the correction parameter for bias correction.

    See Bakk & Kuha (2018) for more details.

  • measurement: String describing the measurement model. See details for the different available model. The default model is "bernouilli"

  • structural: String describing the structural model. See details for the different available model. The default model is "bernouilli"

  • assignment: String indicating the type of class assignments for 3-step estimation, "modal" by default. Must be one of:

    soft: keep class responsibilities (posterior probabilities) as is.

    modal: assign 1 to the class with max probability, 0 otherwise (one-hot encoding).

  • correction: Bias correction for 3-step estimation. Must be one of :

    None: No correction. Run Naive 3-step.

    BCH: Apply the empirical BCH correction from Vermunt, 2004.

    ML: Apply the ML correction from Vermunt, 2010, Bakk et al., 2013.

  • abs_tol: The convergence threshold. EM iterations will stop when the lower bound average gain is below this threshold. The default value is 1e-3.

  • rel_tol: The convergence threshold. EM iterations will stop when the relative lower bound average gain is below this threshold.

  • max_iter: The number of EM iterations to perform.

  • n_init: The number of initializations to perform. The best results are kept.

  • init_params: "kmeans", or "random", default="random". The method used to initialize the weights, the means and the precisions. Must be one of:

    kmeans : responsibilities are initialized using kmeans.

    random : responsibilities are initialized randomly.

  • random_state: State instance or NULL, default=NULL. Controls the random seed given to the method chosen to initialize the parameters. Pass an int for reproducible output across multiple function calls.

  • verbose: Default=0. Enable verbose output. If 1, will print detailed report of the model and the performance metrics after fitting.

  • progress_bar: Display a tqdm progress bar during fitting

  • measurement_params: Default=NULL, Additional params passed to the measurement model class. Particularly useful to specify optimization parameters for stepmix.emission.covariate.Covariate. Ignored if the measurement descriptor is a nested object (see stepmix.emission.nested.Nested).

  • structural_params: Default=NULL, Additional params passed to the structural model class. Particularly useful to specify optimization parameters for stepmix.emission.covariate.Covariate. Ignored if the structural descriptor is a nested object (see stepmix.emission.nested.Nested).

Details

The options for both the measurement and structural part are describe here:

bernoulli: The observed data consists of n_features bernoulli (binary) random variables.

bernoulli_nan: the observed data consists of n_features bernoulli (binary) random variables. Supports missing values.

binary: alias for bernoulli.

binary_nan: alias for bernoulli_nan.

categorical: alias for multinoulli.

categorical_nan: alias for multinoulli_nan.

continuous: alias for gaussian diag.

continuous_nan: alias for gaussian_diag_nan. supports missing values.

covariate: covariate model where class probabilities are a multinomial logistic model of the features.

gaussian: alias for gaussian_unit.

gaussian_nan: alias for gaussian_unit. Supports missing values.

gaussian_unit: each gaussian component has unit variance. Only fit the mean.

gaussian_unit_nan: each gaussian component has unit variance. Only fit the mean. Supports missing values.

gaussian_spherical: each gaussian component has its own single variance.

gaussian_spherical_nan: each gaussian component has its own single variance. Supports missing values.

gaussian_tied: all gaussian components share the same general covariance matrix.

gaussian_diag: each gaussian component has its own diagonal covariance matrix.

gaussian_diag_nan: each gaussian component has its own diagonal covariance matrix. Supports missing values.

gaussian_full: each gaussian component has its own general covariance matrix.

multinoulli: the observed data consists of n_features multinoulli (categorical) random variables.

multinoulli_nan: the observed data consists of n_features multinoulli (categorical) random variables. Supports missing values.

Returns

It returns a list of type stepmixr that contains the arguments of the object.

References

Bolck, A., Croon, M., and Hagenaars, J. Estimating latent structure models with categorical variables: One-step versus three-step estimators. Political analysis, 12(1): 3-27, 2004.

Vermunt, J. K. Latent class modeling with covariates: Two improved three-step approaches. Political analysis, 18 (4):450-469, 2010.

Bakk, Z., Tekle, F. B., and Vermunt, J. K. Estimating the association between latent class membership and external variables using bias-adjusted three-step approaches. Sociological Methodology, 43(1):272-311, 2013.

Bakk, Z. and Kuha, J. Two-step estimation of models between latent classes and external variables. Psychometrika, 83(4):871-892, 2018

Author(s)

Éric Lacourse, Roxane de la Sablonnière, Charles-Édouard Giguère, Sacha Morin, Robin Legault, Félix Laliberté, Zsusza Bakk

See Also

fit

Examples

model1 <- stepmix(n_components = 2, n_steps = 3)