slca function

Structured Latent Class Analysis (SLCA)

Structured Latent Class Analysis (SLCA)

This function implements a structured latent class model for polytomous item responses (Formann, 1985, 1992). Lasso estimation for the item parameters is included (Chen, Liu, Xu & Ying, 2015; Chen, Li, Liu & Ying, 2017; Sun, Chen, Liu, Ying & Xin, 2016).

slca(data, group=NULL, weights=rep(1, nrow(data)), Xdes, Xlambda.init=NULL, Xlambda.fixed=NULL, Xlambda.constr.V=NULL, Xlambda.constr.c=NULL, delta.designmatrix=NULL, delta.init=NULL, delta.fixed=NULL, delta.linkfct="log", Xlambda_positive=NULL, regular_type="lasso", regular_lam=0, regular_w=NULL, regular_n=nrow(data), maxiter=1000, conv=1e-5, globconv=1e-5, msteps=10, convM=5e-04, decrease.increments=FALSE, oldfac=0, dampening_factor=1.01, seed=NULL, progress=TRUE, PEM=TRUE, PEM_itermax=maxiter, ...) ## S3 method for class 'slca' summary(object, file=NULL, ...) ## S3 method for class 'slca' print(x, ...) ## S3 method for class 'slca' plot(x, group=1, ... )

Arguments

  • data: Matrix of polytomous item responses

  • group: Optional vector of group identifiers. For plot.slca it is a single integer group identified.

  • weights: Optional vector of sample weights

  • Xdes: Design matrix for xijhx_{ijh} with qihjv q_{ihjv} entries. Therefore, it must be an array with four dimensions referring to items (ii), categories (hh), latent classes (jj) and λ\lambda parameters (vv).

  • Xlambda.init: Initial λx\lambda_x parameters

  • Xlambda.fixed: Fixed λx\lambda_x parameters. These must be provided by a matrix with two columns: 1st column -- Parameter index, 2nd column: Fixed value.

  • Xlambda.constr.V: A design matrix for linear restrictions of the form Vxλx=cxV_x \lambda_x=c_x for the λx\lambda_x parameter.

  • Xlambda.constr.c: A vector for the linear restriction Vxλx=cxV_x \lambda_x=c_x of the λx\lambda_x parameter.

  • delta.designmatrix: Design matrix for delta parameters δ\delta

    parameterizing the latent class distribution by log-linear smoothing (Xu & von Davier, 2008)

  • delta.init: Initial δ\delta parameters

  • delta.fixed: Fixed δ\delta parameters. This must be a matrix with three columns: 1st column: Parameter index, 2nd column: Group index, 3rd column: Fixed value

  • delta.linkfct: Link function for skill space reduction. This can be the log-linear link (log) or the logistic link function (logit).

  • Xlambda_positive: Optional vector of logical indicating which elements of λx\bold{\lambda}_x should be constrained to be positive.

  • regular_type: Regularization method which can be lasso, scad or mcp. See gdina for more information and references.

  • regular_lam: Numeric. Regularization parameter

  • regular_w: Vector for weighting the regularization penalty

  • regular_n: Vector of regularization factor. This will be typically the sample size.

  • maxiter: Maximum number of iterations

  • conv: Convergence criterion for item parameters and distribution parameters

  • globconv: Global deviance convergence criterion

  • msteps: Maximum number of M steps in estimating bb and aa item parameters. The default is to use 4 M steps.

  • convM: Convergence criterion in M step

  • decrease.increments: Should in the M step the increments of aa and bb parameters decrease during iterations? The default is FALSE. If there is an increase in deviance during estimation, setting decrease.increments to TRUE

    is recommended.

  • oldfac: Factor ff between 0 and 1 to control convergence behavior. If xtx_t denotes the estimated parameter in iteration tt, then the regularized estimate xtx_t^{\ast} is obtained by xt=fxt1+(1f)xtx_t^{\ast}=f x_{t-1} + (1-f) x_t. Therefore, values of oldfac near to one only allow for small changes in estimated parameters from in succeeding iterations.

  • dampening_factor: Factor larger than one defining the specified decrease in decrements in iterations.

  • seed: Simulation seed for initial parameters. The default of NULL corresponds to a random seed.

  • progress: An optional logical indicating whether the function should print the progress of iteration in the estimation process.

  • PEM: Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012).

  • PEM_itermax: Number of iterations in which the P-EM method should be applied.

  • object: A required object of class slca

  • file: Optional file name for a file in which summary

    should be sinked.

  • x: A required object of class slca

  • ...: Optional parameters to be passed to or from other methods will be ignored.

Details

The structured latent class model allows for general constraints of items ii in categories hh and classes jj. The item response model is

P(Xi=hj)=exp(xihj)lexp(xilj) P( X_{i}=h | j )=\frac{ \exp( x_{ihj} ) }{ \sum_l \exp( x_{ilj} ) }

with linear constraints on the class specific probabilities

xihj=vqihjvλxv x_{ihj}=\sum_v q_{ihjv} \lambda_{xv}

Linear restrictions on the λx\lambda_x parameter can be specified by a matrix equation Vxλx=cxV_x \lambda_x=c_x (see Xlambda.constr.V and Xlambda.constr.c; Neuhaus, 1996).

The latent class distribution can be smoothed by a log-linear link function (Xu & von Davier, 2008) or a logistic link function (Formann, 1992). For class jj

in group gg employing a link function hh, it holds that

h[P(jg)]wrjwδgw h [ P( j| g) ] \propto \sum_w r_{jw} \delta_{gw}

where group-specific distributions are allowed. The values rjwr_{jw} are specified in the design matrix delta.designmatrix.

This model contains classical uni- and multidimensional latent trait models, latent class analysis, located latent class analysis, cognitive diagnostic models, the general diagnostic model and mixture item response models as special cases (see Formann & Kohlmann, 1998; Formann, 2007).

The function also allows for regularization of λxv\lambda_{xv} parameters using the lasso approach (Sun et al., 2016). More formally, the penalty function can be written as

pen(λx)=pλvnvwvλxv pen( \bold{\lambda}_x )=p_\lambda \sum_v n_v w_v | \lambda_{xv} |

where pλp_\lambda can be specified with regular_lam, wvw_v can be specified with regular_w, and nvn_v can be specified with regular_n.

Returns

An object of class slca. The list contains the following entries: - item: Data frame with conditional item probabilities

  • deviance: Deviance

  • ic: Information criteria, number of estimated parameters

  • Xlambda: Estimated λx\lambda_x parameters

  • se.Xlambda: Standard error of λx\lambda_x parameters

  • pi.k: Trait distribution

  • pjk: Item response probabilities evaluated for all classes

  • n.ik: An array of expected counts ncikgn_{cikg} of ability class cc

    at item ii at category kk in group gg

  • G: Number of groups

  • I: Number of items

  • N: Number of persons

  • delta: Parameter estimates for skillspace representation

  • covdelta: Covariance matrix of parameter estimates for skillspace representation

  • MLE.class: Classified skills for each student (MLE)

  • MAP.class: Classified skills for each student (MAP)

  • data: Original data frame

  • group.stat: Group statistics (sample sizes, group labels)

  • p.xi.aj: Individual likelihood

  • posterior: Individual posterior distribution

  • K.item: Maximal category per item

  • time: Info about computation time

  • skillspace: Used skillspace parametrization

  • iter: Number of iterations

  • seed.used: Used simulation seed

  • Xlambda.init: Used initial lambda parameters

  • delta.init: Used initial delta parameters

  • converged: Logical indicating whether convergence was achieved.

References

Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56(12), 4122-4137.

Chen, Y., Liu, J., Xu, G., & Ying, Z. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850-866.

Chen, Y., Li, X., Liu, J., & Ying, Z. (2017). Regularized latent class analysis with application in cognitive diagnosis. Psychometrika, 82, 660-692.

Formann, A. K. (1985). Constrained latent class models: Theory and applications. British Journal of Mathematical and Statistical Psychology, 38, 87-111.

Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 87, 476-486.

Formann, A. K. (2007). (Almost) Equivalence between conditional and mixture maximum likelihood estimates for some models of the Rasch type. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models (pp. 177-189). New York: Springer.

Formann, A. K., & Kohlmann, T. (1998). Structural latent class models. Sociological Methods & Research, 26, 530-565.

Neuhaus, W. (1996). Optimal estimation under linear constraints. Astin Bulletin, 26, 233-245.

Sun, J., Chen, Y., Liu, J., Ying, Z., & Xin, T. (2016). Latent variable selection for multidimensional item response theory models via L1L_1 regularization. Psychometrika, 81(4), 921-939.

Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.

Note

If some items have differing number of categories, appropriate class probabilities in non-existing categories per items can be practically set to zero by loading an item for all skill classes on a fixed λx\lambda_x parameter of a small number, e.g. -999.

The implementation of the model builds on pieces work of Anton Formann. See http://www.antonformann.at/ for more information.

See Also

For latent trait models with continuous latent variables see the mirt or TAM packages. For a discrete trait distribution see the MultiLCIRT package.

For latent class models see the poLCA, covLCA or randomLCA

package.

For mixture Rasch or mixture IRT models see the psychomix or mRm package.

Examples

############################################################################# # EXAMPLE 1: data.Students | (Generalized) Partial Credit Model ############################################################################# data(data.Students, package="CDM") dat <- data.Students[, c("mj1","mj2","mj3","mj4","sc1", "sc2") ] # define discretized ability theta.k <- seq( -6, 6, len=21 ) #*** Model 1: Partial credit model # define design matrix for lambda I <- ncol(dat) maxK <- 4 TP <- length(theta.k) NXlam <- I*(maxK-1) + 1 # number of estimated parameters # last parameter is joint slope parameter Xdes <- array( 0, dim=c(I, maxK, TP, NXlam ) ) # Item1Cat1, ..., Item1Cat3, Item2Cat1, ..., dimnames(Xdes)[[1]] <- colnames(dat) dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) ) dimnames(Xdes)[[3]] <- paste0("Class", 1:TP ) v2 <- unlist( sapply( 1:I, FUN=function(ii){ # ii paste0( paste0( colnames(dat)[ii], "_b" ), "Cat", 1:(maxK-1) ) }, simplify=FALSE) ) dimnames(Xdes)[[4]] <- c( v2, "a" ) # define theta design and item discriminations for (ii in 1:I){ for (hh in 1:(maxK-1) ){ Xdes[ii, hh + 1,, NXlam ] <- hh * theta.k } } # item intercepts for (ii in 1:I){ for (hh in 1:(maxK-1) ){ # ii <- 1 # Item # hh <- 1 # category Xdes[ii,hh+1,, ( ii - 1)*(maxK-1) + hh] <- 1 } } #**** # skill space designmatrix TP <- length(theta.k) w1 <- stats::dnorm(theta.k) w1 <- w1 / sum(w1) delta.designmatrix <- matrix( 1, nrow=TP, ncol=1 ) delta.designmatrix[,1] <- log(w1) # initial lambda parameters Xlambda.init <- c( stats::rnorm( dim(Xdes)[[4]] - 1 ), 1 ) # fixed delta parameter delta.fixed <- cbind( 1, 1,1 ) # estimate model mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.init=Xlambda.init, delta.fixed=delta.fixed ) summary(mod1) plot(mod1, cex.names=.7 ) ## Not run: #*** Model 2: Partial credit model with some parameter constraints # fixed lambda parameters Xlambda.fixed <- cbind( c(1,19), c(3.2,1.52 ) ) # 1st parameter=3.2 # 19th parameter=1.52 (joint item slope) mod2 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, delta.init=delta.init, Xlambda.init=Xlambda.init, delta.fixed=delta.fixed, Xlambda.fixed=Xlambda.fixed, maxiter=70 ) #*** Model 3: Partial credit model with non-normal distribution Xlambda.fixed <- cbind( c(1,19), c(3.2,1) ) # fix item slope to one delta.designmatrix <- cbind( 1, theta.k, theta.k^2, theta.k^3 ) mod3 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.fixed=Xlambda.fixed, maxiter=200 ) summary(mod3) # non-normal distribution with convergence regularizing factor oldfac mod3a <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.fixed=Xlambda.fixed, maxiter=500, oldfac=.95 ) summary(mod3a) #*** Model 4: Generalized Partial Credit Model # estimate generalized partial credit model without restrictions on trait # distribution and item parameters to ensure better convergence behavior # Note that two parameters are not identifiable and information criteria # have to be adapted. #--- # define design matrix for lambda I <- ncol(dat) maxK <- 4 TP <- length(theta.k) NXlam <- I*(maxK-1) + I # number of estimated parameters Xdes <- array( 0, dim=c(I, maxK, TP, NXlam ) ) # Item1Cat1, ..., Item1Cat3, Item2Cat1, ..., dimnames(Xdes)[[1]] <- colnames(dat) dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) ) dimnames(Xdes)[[3]] <- paste0("Class", 1:TP ) v2 <- unlist( sapply( 1:I, FUN=function(ii){ # ii paste0( paste0( colnames(dat)[ii], "_b" ), "Cat", 1:(maxK-1) ) }, simplify=FALSE) ) dimnames(Xdes)[[4]] <- c( v2, paste0( colnames(dat),"_a") ) dimnames(Xdes) # define theta design and item discriminations for (ii in 1:I){ for (hh in 1:(maxK-1) ){ Xdes[ii, hh + 1,, I*(maxK-1) + ii ] <- hh * theta.k } } # item intercepts for (ii in 1:I){ for (hh in 1:(maxK-1) ){ Xdes[ii,hh+1,, ( ii - 1)*(maxK-1) + hh] <- 1 } } #**** # skill space designmatrix delta.designmatrix <- cbind( 1, theta.k,theta.k^2 ) # initial lambda parameters from partial credit model Xlambda.init <- mod1$Xlambda Xlambda.init <- c( mod1$Xlambda[ - length(Xlambda.init) ], rep( Xlambda.init[ length(Xlambda.init) ],I) ) # estimate model mod4 <- CDM::slca( dat, Xdes=Xdes, Xlambda.init=Xlambda.init, delta.designmatrix=delta.designmatrix, decrease.increments=TRUE, maxiter=300 ) ############################################################################# # EXAMPLE 2: Latent class model with two classes ############################################################################# set.seed(9876) I <- 7 # number of items # simulate response probabilities a1 <- stats::runif(I, 0, .4 ) a2 <- stats::runif(I, .6, 1 ) N <- 1000 # sample size # simulate data in two classes of proportions .3 and .7 N1 <- round(.3*N) dat1 <- 1 * ( matrix(a1,N1,I,byrow=TRUE) > matrix( stats::runif( N1 * I), N1, I ) ) N2 <- round(.7*N) dat2 <- 1 * ( matrix(a2,N2,I,byrow=TRUE) > matrix( stats::runif( N2 * I), N2, I ) ) dat <- rbind( dat1, dat2 ) colnames(dat) <- paste0("I", 1:I) # define design matrices TP <- 2 # two classes # The idea is that latent classes refer to two different "dimensions". # Items load on latent class indicators 1 and 2, see below. Xdes <- array(0, dim=c(I,2,2,2*I) ) items <- colnames(dat) dimnames(Xdes)[[4]] <- c(paste0( colnames(dat), "Class", 1), paste0( colnames(dat), "Class", 2) ) # items, categories, classes, parameters # probabilities for correct solution for (ii in 1:I){ Xdes[ ii, 2, 1, ii ] <- 1 # probabilities class 1 Xdes[ ii, 2, 2, ii+I ] <- 1 # probabilities class 2 } # estimate model mod1 <- CDM::slca( dat, Xdes=Xdes ) summary(mod1) ############################################################################# # EXAMPLE 3: Mixed Rasch model with two classes ############################################################################# set.seed(987) library(sirt) # simulate two latent classes of Rasch populations I <- 15 # 6 items b1 <- seq( -1.5, 1.5, len=I) # difficulties latent class 1 b2 <- b1 # difficulties latent class 2 b2[ c(4,7, 9, 11, 12, 13) ] <- c(1, -.5, -.5, .33, .33, -.66 ) N <- 3000 # number of persons wgt <- .25 # class probability for class 1 # class 1 dat1 <- sirt::sim.raschtype( stats::rnorm( wgt*N ), b1 ) # class 2 dat2 <- sirt::sim.raschtype( stats::rnorm( (1-wgt)*N, mean=1, sd=1.7), b2 ) dat <- rbind( dat1, dat2 ) # theta grid theta.k <- seq( -5, 5, len=9 ) TP <- length(theta.k) #*** Model 1: Rasch model with normal distribution maxK <- 2 NXlam <- I +1 Xdes <- array( 0, dim=c(I, maxK, TP, NXlam ) ) dimnames(Xdes)[[1]] <- colnames(dat) dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) ) dimnames(Xdes)[[4]] <- c( paste0( "b_", colnames(dat)[1:I] ), "a" ) # define item difficulties for (ii in 1:I){ Xdes[ii, 2,, ii ] <- -1 } # theta design for (tt in 1:TP){ Xdes[1:I, 2, tt, I + 1] <- theta.k[tt] } # skill space definition delta.designmatrix <- cbind( 1, theta.k^2 ) delta.fixed <- NULL Xlambda.init <- c( stats::runif( I, -.8, .8 ), 1 ) Xlambda.fixed <- cbind( I+1, 1 ) # estimate model mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, delta.fixed=delta.fixed, Xlambda.fixed=Xlambda.fixed, Xlambda.init=Xlambda.init, decrease.increments=TRUE, maxiter=200 ) summary(mod1) #*** Model 1b: Constraint the sum of item difficulties to zero # change skill space definition delta.designmatrix <- cbind( 1, theta.k, theta.k^2 ) delta.fixed <- NULL # constrain sum of difficulties Xlambda parameters to zero Xlambda.constr.V <- matrix( 1, nrow=I+1, ncol=1 ) Xlambda.constr.V[I+1,1] <- 0 Xlambda.constr.c <- c(0) # estimate model mod1b <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.fixed=Xlambda.fixed, Xlambda.constr.V=Xlambda.constr.V, Xlambda.constr.c=Xlambda.constr.c ) summary(mod1b) #*** Model 2: Mixed Rasch model with two latent classes NXlam <- 2*I +2 Xdes <- array( 0, dim=c(I, maxK, 2*TP, NXlam ) ) dimnames(Xdes)[[1]] <- colnames(dat) dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) ) dimnames(Xdes)[[4]] <- c( paste0( "bClass1_", colnames(dat)[1:I] ), paste0( "bClass2_", colnames(dat)[1:I] ), "aClass1", "aClass2" ) # define item difficulties for (ii in 1:I){ Xdes[ii, 2, 1:TP, ii ] <- -1 # first class Xdes[ii, 2, TP + 1:TP, I+ii ] <- -1 # second class } # theta design for (tt in 1:TP){ Xdes[1:I, 2, tt, 2*I+1 ] <- theta.k[tt] Xdes[1:I, 2, TP+tt, 2*I+2 ] <- theta.k[tt] } # skill space definition delta.designmatrix <- matrix( 0, nrow=2*TP, ncol=4 ) delta.designmatrix[1:TP,1] <- 1 delta.designmatrix[1:TP,2] <- theta.k^2 delta.designmatrix[TP + 1:TP,3] <- 1 delta.designmatrix[TP+ 1:TP,4] <- theta.k^2 b1 <- stats::qnorm( colMeans(dat) ) Xlambda.init <- c( stats::runif( 2*I, -1.8, 1.8 ), 1,1 ) Xlambda.fixed <- cbind( c(2*I+1, 2*I+2), 1 ) # estimate model mod2 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.fixed=Xlambda.fixed, decrease.increments=TRUE, Xlambda.init=Xlambda.init, maxiter=1000 ) summary(mod2) summary(mod1) # latent class proportions stats::aggregate( mod2$pi.k, list( rep(1:2, each=TP)), sum ) #*** Model 2b: Different parametrization with sum constraint on item difficulties # skill space definition delta.designmatrix <- matrix( 0, nrow=2*TP, ncol=6 ) delta.designmatrix[1:TP,1] <- 1 delta.designmatrix[1:TP,2] <- theta.k delta.designmatrix[1:TP,3] <- theta.k^2 delta.designmatrix[TP+ 1:TP,4] <- 1 delta.designmatrix[TP+ 1:TP,5] <- theta.k delta.designmatrix[TP+ 1:TP,6] <- theta.k^2 Xlambda.fixed <- cbind( c(2*I+1,2*I+2), c(1,1) ) b1 <- stats::qnorm( colMeans( dat ) ) Xlambda.init <- c( b1, b1 + stats::runif(I, -1, 1 ), 1, 1 ) # constraints on item difficulties Xlambda.constr.V <- matrix( 0, nrow=NXlam, ncol=2) Xlambda.constr.V[1:I, 1 ] <- 1 Xlambda.constr.V[I + 1:I, 2 ] <- 1 Xlambda.constr.c <- c(0,0) # estimate model mod2b <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.fixed=Xlambda.fixed, Xlambda.init=Xlambda.init, Xlambda.constr.V=Xlambda.constr.V, Xlambda.constr.c=Xlambda.constr.c, decrease.increments=TRUE, maxiter=1000 ) summary(mod2b) stats::aggregate( mod2b$pi.k, list( rep(1:2, each=TP)), sum ) #*** Model 2c: Estimation with mRm package library(mRm) mod2c <- mRm::mrm(data.matrix=dat, cl=2) plot(mod2c) print(mod2c) #*** Model 2d: Estimation with psychomix package library(psychomix) mod2d <- psychomix::raschmix(data=dat, k=2, verbose=TRUE ) summary(mod2d) plot(mod2d) ############################################################################# # EXAMPLE 4: Located latent class model, Rasch model ############################################################################# set.seed(487) library(sirt) I <- 15 # I items b1 <- seq( -2, 2, len=I) # item difficulties N <- 4000 # number of persons # simulate 4 theta classes theta0 <- c( -2.5, -1, 0.3, 1.3 ) # skill classes probs0 <- c( .1, .4, .2, .3 ) TP <- length(theta0) theta <- theta0[ rep(1:TP, round(probs0*N) ) ] dat <- sirt::sim.raschtype( theta, b1 ) #*** Model 1: Located latent class model with 4 classes maxK <- 2 NXlam <- I + TP Xdes <- array( 0, dim=c(I, maxK, TP, NXlam ) ) dimnames(Xdes)[[1]] <- colnames(dat) dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) ) dimnames(Xdes)[[3]] <- paste0("Class", 1:TP ) dimnames(Xdes)[[4]] <- c( paste0( "b_", colnames(dat)[1:I] ), paste0("theta", 1:TP) ) # define item difficulties for (ii in 1:I){ Xdes[ii, 2,, ii ] <- -1 } # theta design for (tt in 1:TP){ Xdes[1:I, 2, tt, I + tt] <- 1 } # skill space definition delta.designmatrix <- diag(TP) Xlambda.init <- c( - stats::qnorm( colMeans(dat) ), seq(-2,1,len=TP) ) # constraint on item difficulties Xlambda.constr.V <- matrix( 0, nrow=NXlam, ncol=1) Xlambda.constr.V[1:I,1] <- 1 Xlambda.constr.c <- c(0) delta.init <- matrix( c(1,1,1,1), TP, 1 ) # estimate model mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, delta.init=delta.init, Xlambda.init=Xlambda.init, Xlambda.constr.V=Xlambda.constr.V, Xlambda.constr.c=Xlambda.constr.c, decrease.increments=TRUE, maxiter=400 ) summary(mod1) # compare estimated and simulated theta class locations cbind( mod1$Xlambda[ - c(1:I) ], theta0 ) # compare estimated and simulated latent class proportions cbind( mod1$pi.k, probs0 ) ############################################################################# # EXAMPLE 5: DINA model with two skills ############################################################################# set.seed(487) N <- 3000 # number of persons # define Q-matrix I <- 9 # 9 items NS <- 2 # 2 skills TP <- 4 # number of skill classes Q <- scan( nlines=3) 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 Q <- matrix(Q, I, ncol=NS,byrow=TRUE) # define skill distribution alpha0 <- matrix( c(0,0,1,0,0,1,1,1), nrow=4,ncol=2,byrow=TRUE) prob0 <- c( .2, .4, .1, .3 ) alpha <- alpha0[ rep( 1:TP, prob0*N),] # define guessing and slipping parameters guess <- round( stats::runif(I, 0, .4 ), 2 ) slip <- round( stats::runif(I, 0, .3 ), 2 ) # simulate data according to the DINA model dat <- CDM::sim.din( q.matrix=Q, alpha=alpha, slip=slip, guess=guess )$dat # define Xlambda design matrix maxK <- 2 NXlam <- 2*I Xdes <- array( 0, dim=c(I, maxK, TP, NXlam ) ) dimnames(Xdes)[[1]] <- colnames(dat) dimnames(Xdes)[[2]] <- paste0("Cat", 1:(maxK) ) dimnames(Xdes)[[3]] <- c("S00","S10","S01","S11") dimnames(Xdes)[[4]] <- c( paste0("guess",1:I ), paste0( "antislip", 1:I ) ) dimnames(Xdes) # define item difficulties for (ii in 1:I){ # define latent responses latresp <- 1*( alpha0 %*% Q[ii,]==sum(Q[ii,]) )[,1] # model slipping parameters Xdes[ii, 2, latresp==1, I+ii ] <- 1 # guessing parameters Xdes[ii, 2, latresp==0, ii ] <- 1 } Xdes[1,2,,] Xdes[7,2,,] # skill space definition delta.designmatrix <- diag(TP) Xlambda.init <- c( rep( stats::qlogis( .2 ), I ), rep( stats::qlogis( .8 ), I ) ) # estimate DINA model with slca function mod1 <- CDM::slca( dat, Xdes=Xdes, delta.designmatrix=delta.designmatrix, Xlambda.init=Xlambda.init, decrease.increments=TRUE, maxiter=400 ) summary(mod1) # compare estimated and simulated latent class proportions cbind( mod1$pi.k, probs0 ) # compare estimated and simulated guessing parameters cbind( mod1$pjk[1,,2], guess ) # compare estimated and simulated slipping parameters cbind( 1 - mod1$pjk[4,,2], slip ) ############################################################################# # EXAMPLE 6: Investigating differential item functioning in Rasch models # with regularization ############################################################################# #---- simulate data set.seed(987) N <- 1000 # number of persons in a group I <- 20 # number of items #* population parameters of two groups mu1 <- 0 mu2 <- .6 sd1 <- 1.4 sd2 <- 1 # item difficulties b <- seq( -1.1, 1.1, len=I ) # define some DIF effects dif <- rep(0,I) dif[ c(3,6,9,12)] <- c( .6, -1, .75, -.35 ) print(dif) #* simulate datasets dat1 <- sirt::sim.raschtype( rnorm(N, mean=mu1, sd=sd1), b=b - dif /2 ) colnames(dat1) <- paste0("I", 1:I, "_G1") dat2 <- sirt::sim.raschtype( rnorm(N, mean=mu2, sd=sd2), b=b + dif /2 ) colnames(dat2) <- paste0("I", 1:I, "_G2") dat <- CDM::CDM_rbind_fill( dat1, dat2 ) dat <- data.frame( "group"=rep(1:2, each=N), dat ) #-- nodes for distribution theta.k <- seq(-4, 4, len=11) # define design matrix for lambda nitems <- ncol(dat) - 1 maxK <- 2 TP <- length(theta.k) NXlam <- 2*I + 1 Xdes <- array( 0, dim=c( nitems, maxK, TP, NXlam ) ) dimnames(Xdes)[[1]] <- colnames(dat)[-1] dimnames(Xdes)[[2]] <- paste0("Cat", 0:(maxK-1) ) dimnames(Xdes)[[3]] <- paste0("Theta", 1:TP ) dimnames(Xdes)[[4]] <- c( paste0("b", 1:I ), paste0("dif", 1:I ), "const" ) # define theta design for (ii in 1:nitems){ Xdes[ii,2,,NXlam ] <- theta.k } # item intercepts and DIF effects for (ii in 1:I){ Xdes[c(ii,ii+I),2,, ii ] <- -1 Xdes[ii,2,,ii+I] <- - 1/2 Xdes[ii+I,2,,ii+I] <- 1/2 } #--- skill space designmatrix TP <- length(theta.k) w1 <- stats::dnorm(theta.k) w1 <- w1 / sum(w1) delta.designmatrix <- matrix( 1, nrow=TP, ncol=2 ) delta.designmatrix[,2] <- log(w1) # fixed lambda parameters Xlambda.fixed <- cbind(NXlam, 1 ) # initial Xlambda parameters dif_sim <- 0*stats::rnorm(I, sd=.2) Xlambda.init <- c( - stats::qnorm( colMeans(dat1) ), dif_sim, 1 ) # delta.fixed delta.fixed <- cbind( 1, 1, 0 ) # regularization parameter regular_lam <- .2 # weighting vector: regularize only DIF effects regular_w <- c( rep(0,I), rep(1,I), 0 ) #--- estimation model with scad penalty mod1 <- CDM::slca( dat[,-1], group=dat$group, Xdes=Xdes, delta.designmatrix=delta.designmatrix, regular_type="scad", Xlambda.init=Xlambda.init, delta.fixed=delta.fixed, Xlambda.fixed=Xlambda.fixed, regular_lam=regular_lam, regular_w=regular_w ) # compare true and estimated DIF effects cbind( "true"=dif, "estimated"=round(coef(mod1)[seq(I+1,2*I)],2) ) summary(mod1) ## End(Not run)