This function implements the generalized DINA model for dichotomous attributes (GDINA; de la Torre, 2011) and polytomous attributes (pGDINA; Chen & de la Torre, 2013, 2018). In addition, multiple group estimation is also possible using the gdina function. This function also allows for the estimation of a higher order GDINA model (de la Torre & Douglas, 2004). Polytomous item responses are treated by specifying a sequential GDINA model (Ma & de la Torre, 2016; Tutz, 1997). The simulataneous modeling of skills and misconceptions (bugs) can be also estimated within the GDINA framework (see Kuo, Chen & de la Torre, 2018; see argument rule).
The estimation can also be conducted by posing monotonocity constraints (Hong, Chang, & Tsai, 2016) using the argument mono.constr. Moreover, regularization methods SCAD, lasso, ridge, SCAD-L2 and truncated L1 penalty (TLP) for item parameters can be employed (Xu & Shang, 2018).
Normally distributed priors can be specified for item parameters (item intercepts and item slopes). Note that (for convenience) the prior specification holds simultaneously for all items.
data: A required N×J data matrix containing integer responses, 0, 1, ..., K. Polytomous item responses are treated by the sequential GDINA model. NA values are allowed.
q.matrix: A required integer J×K matrix containing attributes not required or required, 0 or 1, to master the items in case of dichotomous attributes or integers in case of polytomous attributes. For polytomous item responses the Q-matrix must also include the item name and item category, see Example 11.
skillclasses: An optional matrix for determining the skill space. The argument can be used if a user wants less than 2K skill classes.
conv.crit: Convergence criterion for maximum absolute change in item parameters
dev.crit: Convergence criterion for maximum absolute change in deviance
maxit: Maximum number of iterations
linkfct: A string which indicates the link function for the GDINA model. Options are "identity" (identity link), "logit" (logit link) and "log" (log link). The default is the "identity" link. Note that the link function is chosen for the whole model (i.e. for all items).
Mj: A list of design matrices and labels for each item. The definition of Mj follows the definition of Mj in de la Torre (2011). Please study the value Mj of the function in default analysis. See Example 3.
group: A vector of group identifiers for multiple group estimation. Default is NULL (no multiple group estimation).
invariance: Logical indicating whether invariance of item parameters is assumed for multiple group models. If a subset of items should be treated as noninvariant, then invariance can be a vector of item names.
method: Estimation method for item parameters (see) (de la Torre, 2011). The default "WLS"
weights probabilities attribute classes by a weighting matrix Wj of expected frequencies, whereas the method "ULS" perform unweighted least squares estimation on expected frequencies. The method "ML" directly maximizes the log-likelihood function. The "ML" method is a bit slower but can be much more stable, especially in the case of the RRUM model. Only for the RRUM model, the default is changed to method="ML" if not specified otherwise.
delta.init: List with initial δ parameters
delta.fixed: List with fixed δ parameters. For free estimated parameters NA must be declared.
delta.designmatrix: A design matrix for restrictions on delta. See Example 4.
delta.basispar.lower: Lower bounds for delta basis parameters.
delta.basispar.upper: Upper bounds for delta basis parameters.
delta.basispar.init: An optional vector of starting values for the basis parameters of delta. This argument only applies when using a designmatrix for delta, i.e. delta.designmatrix is not NULL.
zeroprob.skillclasses: An optional vector of integers which indicates which skill classes should have zero probability. Default is NULL (no skill classes with zero probability).
attr.prob.init: Initial probabilities of skill distribution.
attr.prob.fixed: Vector or matrix with fixed probabilities of skill distribution.
reduced.skillspace: A logical which indicates if the latent class skill space dimension should be reduced (see Xu & von Davier, 2008). The default is NULL
which applies skill space reduction for more than four skills. The dimensional reduction is only well defined for more than three skills. If the argument zeroprob.skillclasses is not NULL, then reduced.skillspace is set to FALSE.
reduced.skillspace.method: Computation method for skill space reduction in case of reduced.skillspace=TRUE. The default is 2
which is computationally more efficient but introduced in CDM 2.6. For reasons of compatibility of former CDM versions (≤ 2.5), reduced.skillspace.method=1 uses the older implemented method. In case of non-convergence with the new method, please try the older method.
HOGDINA: Values of -1, 0 or 1 indicating if a higher order GDINA model (see Details) should be estimated. The default value of -1 corresponds to the case that no higher order factor is assumed to exist. A value of 0 corresponds to independent attributes. A value of 1 assumes the existence of a higher order factor.
Z.skillspace: A user specified design matrix for the skill space reduction as described in Xu and von Davier (2008). See in the Examples section for applications. See Example 6.
weights: An optional vector of sample weights.
rule: A string or a vector of itemwise condensation rules. Allowed entries are GDINA, DINA, DINO, ACDM (additive cognitive diagnostic model) and RRUM (reduced reparametrized unified model, RRUM, see Details). The rule GDINA1 applies only main effects in the GDINA model which is equivalent to ACDM. The rule GDINA2 applies to all main effects and second-order interactions of the attributes. If some item is specified as RRUM, then for all the items the reduced RUM will be estimated which means that the log link function and the ACDM condensation rule is used. In the output, the entry rrum.params contains the parameters transformed in the RUM parametrization. If rule is a string, the condensation rule applies to all items. If rule is a vector, condensation rules can be specified itemwise. The default is GDINA for all items.
bugs: Character vector indicating which columns in the Q-matrix refer to bugs (misconceptions). This is only available if some rule
is set to "SISM". Note that bugs must be included as last columns in the Q-matrix.
regular_lam: Regularization parameter λ
regular_type: Type of regularization. Can be scad (SCAD penalty), lasso (lasso penalty), ridge (ridge penalty), elnet (elastic net), scadL2 (SCAD-L2; Zeng & Xie, 2014), tlp (truncated L1 penalty; Xu & Shang, 2018; Shen, Pan, & Zhu, 2012), mcp (MCP penalty; Zhang, 2010) or none (no regularization).
regular_alpha: Regularization parameter α
(applicable for elastic net or SCAD-L2.
regular_tau: Regularization parameter τ for truncated L1 penalty.
regular_weights: Optional list of item parameter weights used for penalties in regularized estimation (see Example 13)
mono.constr: Logical indicating whether monotonicity constraints should be fulfilled in estimation (implemented by the increasing penalty method; see Nash, 2014, p. 156).
prior_intercepts: Vector with mean and standard deviation for prior of random intercepts (applies to all items)
prior_slopes: Vector with mean and standard deviation for prior of random slopes (applies to all items and all parameters)
progress: An optional logical indicating whether the function should print the progress of iteration in the estimation process.
progress.item: An optional logical indicating whether item wise progress should be displayed
mstep_iter: Number of iterations in M-step if method="ML".
mstep_conv: Convergence criterion in M-step if method="ML".
increment.factor: A factor larger than 1 (say 1.1) to control maximum increments in item parameters. This parameter can be used in case of nonconvergence.
fac.oldxsi: A convergence acceleration factor between 0 and 1 which defines the weight of previously estimated values in current parameter updates.
max.increment: Maximum size of change in increments in M steps of EM algorithm when method="ML" is used.
avoid.zeroprobs: An optional logical indicating whether for estimating item parameters probabilities occur. Especially if not a skill classes are used, it is recommended to switch the argument to TRUE.
seed: Simulation seed for initial parameters. A value of zero corresponds to deterministic starting values, an integer value different from zero to random initial values with set.seed(seed).
save.devmin: An optional logical indicating whether intermediate estimates should be saved corresponding to minimal deviance. Setting the argument to FALSE could help for preventing working memory overflow.
calc.se: Optional logical indicating whether standard errors should be calculated.
se_version: Integer for calculation method of standard errors. se_version=1 is based on the observed log likelihood and included since CDM 5.1 and is the default. Comparability with previous CDM versions can be obtained with se_version=0.
PEM: Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012).
PEM_itermax: Number of iterations in which the P-EM method should be applied.
cd: Logical indicating whether coordinate descent algorithm should be used.
cd_steps: Number of steps for each parameter in coordinate descent algorithm
mono_maxiter: Maximum number of iterations for fulfilling the monotonicity constraint
freq_weights: Logical indicating whether frequency weights should be used. Default is FALSE.
optimizer: String indicating which optimizer should be used in M-step estimation in case of method="ML". The internal optimizer of CDM can be requested by optimizer="CDM". The optimization with stats::optim
can be requested by optimizer="optim". For the RRUM model, it is always chosen optimizer="optim".
object: A required object of class gdina, obtained from a call to the function gdina.
digits: Number of digits after decimal separator to display.
file: Optional file name for a file in which summary
should be sinked.
x: A required object of class gdina
ask: A logical indicating whether every separate item should be displayed in plot.gdina
...: Optional parameters to be passed to or from other methods will be ignored.
Details
The estimation is based on an EM algorithm as described in de la Torre (2011). Item parameters are contained in the delta vector which is a list where the jth entry corresponds to item parameters of the jth item.
The following description refers to the case of dichotomous attributes. For using polytomous attributes see Chen and de la Torre (2013) and Example 7 for a definition of the Q-matrix. In this case, Qik=l
means that the ith item requires the mastery (at least) of level l of attribute k.
Assume that two skills α1 and α2 are required for mastering item j. Then the GDINA model can be written as
which is a two-way GDINA-model (the rule="GDINA2" specification) with a link function g (which can be the identity, logit or logarithmic link). If the specification ACDM is chosen, then δj12=0. The DINA model (rule="DINA") assumes δj1=δj2=0.
For the reduced RUM model (rule="RRUM"), the item response model is
P(Xnj=1∣αn)=πi∗⋅ri11−αi1⋅ri21−αi2
From this equation, it is obvious, that this model is equivalent to an additive model (rule="ACDM") with a logarithmic link function (linkfct="log").
If a reduced skillspace (reduced.skillspace=TRUE) is employed, then the logarithm of probability distribution of the attributes is modeled as a log-linear model:
If a higher order DINA model is assumed (HOGDINA=1), then a higher order factor θn for the attributes is assumed:
P(αnk=1∣θn)=Φ(akθn+bk)
For HOGDINA=0, all attributes αnk are assumed to be independent of each other:
P[(αn1,αn2,…,αnK)]=k∏P(αnk)
Note that the noncompensatory reduced RUM (NC-RRUM) according to Rupp and Templin (2008) is the GDINA model with the arguments rule="ACDM" and linkfct="log". NC-RRUM can also be obtained by choosing rule="RRUM".
The compensatory RUM (C-RRUM) can be obtained by using the arguments rule="ACDM" and linkfct="logit".
The cognitive diagnosis model for identifying skills and misconceptions (SISM; Kuo, Chen & de la Torre, 2018) can be estimated with rule="SISM" (see Example 12).
The gdina function internally parameterizes the GDINA model as
g[P(Xnj=1∣αn)]=Mj(αn)δj
with item-specific design matrices Mj(αn) and item parameters δj. Only those attributes are modelled which correspond to non-zero entries in the Q-matrix. Because the Q-matrix (in q.matrix) and the design matrices (in M_j; see Example 3) can be specified by the user, several cognitive diagnosis models can be estimated. Therefore, some additional extensions of the DINA model can also be estimated using the gdina function. These models include the DINA model with multiple strategies (Huo & de la Torre, 2014)
Returns
An object of class gdina with following entries - coef: Data frame of item parameters
delta: List with basis item parameters
se.delta: Standard errors of basis item parameters
probitem: Data frame with model implied conditional item probabilities P(Xi=1∣α). These probabilities are displayed in plot.gdina.
itemfit.rmsea: The RMSEA item fit index (see itemfit.rmsea).
mean.rmsea: Mean of RMSEA item fit indexes.
loglike: Log-likelihood
deviance: Deviance
G: Number of groups
N: Sample size
AIC: AIC
BIC: BIC
CAIC: CAIC
Npars: Total number of parameters
Nipar: Number of item parameters
Nskillpar: Number of parameters for skill class distribution
Nskillclasses: Number of skill classes
varmat.delta: Covariance matrix of δ item parameters
posterior: Individual posterior distribution
like: Individual likelihood
data: Original data
q.matrix: Used Q-matrix
pattern: Individual patterns, individual MLE and MAP classifications and their corresponding probabilities
Mj: Design matrix Mj in GDINA algorithm (see de la Torre, 2011)
Aj: Design matrix Aj in GDINA algorithm (see de la Torre, 2011)
rule: Used condensation rules
linkfct: Used link function
delta.designmatrix: Designmatrix for item parameters
reduced.skillspace: A logical if skillspace reduction was performed
Z.skillspace: Design matrix for skillspace reduction
beta: Parameters δ for skill class representation
covbeta: Standard errors of δ parameters
iter: Number of iterations
rrum.params: Parameters in the parametrization of the reduced RUM model if rule="RRUM".
group.stat: Group statistics (sample sizes, group labels)
HOGDINA: The used value of HOGDINA
mono.constr: Monotonicity constraint
regularization: Logical indicating whether regularization is used
regular_lam: Regularization parameter
numb_bound_mono: Number of items with parameters at boundary of monotonicity constraints
numb_regular_pars: Number of regularized item parameters
delta_regularized: List indicating which item parameters are regularized
cd_algorithm: Logical indicating whether coordinate descent algorithm is used
cd_steps: Number of steps for each parameter in coordinate descent algorithm
seed: Used simulation seed
a.attr: Attribute parameters ak in case of HOGDINA>=0
b.attr: Attribute parameters bk in case of HOGDINA>=0
attr.rf: Attribute response functions. This matrix contains all ak and bk parameters
converged: Logical indicating whether convergence was achieved.
control: Optimization parameters used in estimation
partable: Parameter table for gdina function
polychor: Group-wise matrices with polychoric correlations
sequential: Logical indicating whether a sequential GDINA model is applied for polytomous item responses
...: Further values
References
Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56(12), 4122-4137.
Chen, J., & de la Torre, J. (2013). A general cognitive diagnosis model for expert-defined polytomous attributes. Applied Psychological Measurement, 37, 419-437.
Chen, J., & de la Torre, J. (2018). Introducing the general polytomous diagnosis modeling framework. Frontiers in Psychology | Quantitative Psychology and Measurement, 9(1474).
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199.
Hong, C. Y., Chang, Y. W., & Tsai, R. C. (2016). Estimation of generalized DINA model with order restrictions. Journal of Classification, 33(3), 460-484.
Huo, Y., de la Torre, J. (2014). Estimating a cognitive diagnostic model for multiple strategies via the EM algorithm. Applied Psychological Measurement, 38, 464-485.
Kuo, B.-C., Chen, C.-H., & de la Torre, J. (2018). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement, 42(3), 179-191.
Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275.
Nash, J. C. (2014). Nonlinear parameter optimization using tools. West Sussex: Wiley.
Rupp, A. A., & Templin, J. (2008). Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Measurement: Interdisciplinary Research and Perspectives, 6, 219-262.
Shen, X., Pan, W., & Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. Journal of the American Statistical Association, 107, 223-232.
Tutz, G. (1997). Sequential models for ordered responses. In W. van der Linden & R. K. Hambleton. Handbook of modern item response theory (pp. 139-152). New York: Springer.
Xu, G., & Shang, Z. (2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 523, 1284-1295.
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.
Zeng, L., & Xie, J. (2014). Group variable selection via SCAD-L2. Statistics, 48, 49-66.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38, 894-942.
Note
The function din does not allow for multiple group estimation. Use this gdina function instead and choose the appropriate rule="DINA"
as an argument.
Standard error calculation in analyses which use sample weights or designmatrix for delta parameters (delta.designmatrix!=NULL) is not yet correctly implemented. Please use replication methods instead.
See Also
See also the din function (for DINA and DINO estimation).
For assessment of model fit see modelfit.cor.din and anova.gdina.
See itemfit.sx2 for item fit statistics.
See sim.gdina for simulating the GDINA model.
See gdina.wald for a Wald test for testing the DINA and ACDM rules at the item-level.
See gdina.dif for assessing differential item functioning.
See discrim.index for computing discrimination indices.
See the GDINA::GDINA function in the GDINA package for similar functionality.
Examples
############################################################################## EXAMPLE 1: Simulated DINA data | different condensation rules#############################################################################data(sim.dina, package="CDM")data(sim.qmatrix, package="CDM")dat <- sim.dina
Q <- sim.qmatrix
#***# Model 1: estimation of the GDINA model (identity link)mod1 <- CDM::gdina( data=dat, q.matrix=Q)summary(mod1)plot(mod1)# apply plot function## Not run:# Model 1a: estimate model with different simulation seedmod1a <- CDM::gdina( data=dat, q.matrix=Q, seed=9089)summary(mod1a)# Model 1b: estimate model with some fixed delta parametersdelta.fixed <- as.list( rep(NA,9))# List for parameters of 9 itemsdelta.fixed[[2]]<- c(0,.15,.15,.45)delta.fixed[[6]]<- c(.25,.25)mod1b <- CDM::gdina( data=dat, q.matrix=Q, delta.fixed=delta.fixed)summary(mod1b)# Model 1c: fix all delta parameters to previously fitted modelmod1c <- CDM::gdina( data=dat, q.matrix=Q, delta.fixed=mod1$delta)summary(mod1c)# Model 1d: estimate GDINA model with GDINA packagemod1d <- GDINA::GDINA( dat=dat, Q=Q, model="GDINA")summary(mod1d)# extract item parametersGDINA::itemparm(mod1d)GDINA::itemparm(mod1d, what="delta")# compare likelihoodlogLik(mod1)logLik(mod1d)#***# Model 2: estimation of the DINA model with gdina functionmod2 <- CDM::gdina( data=dat, q.matrix=Q, rule="DINA")summary(mod2)plot(mod2)#***# Model 2b: compare results with din functionmod2b <- CDM::din( data=dat, q.matrix=Q, rule="DINA")summary(mod2b)# Model 2: estimation of the DINO model with gdina functionmod3 <- CDM::gdina( data=dat, q.matrix=Q, rule="DINO")summary(mod3)#***# Model 4: DINA model with logit linkmod4 <- CDM::gdina( data=dat, q.matrix=Q, rule="DINA", linkfct="logit")summary(mod4)#***# Model 5: DINA model log linkmod5 <- CDM::gdina( data=dat, q.matrix=Q, rule="DINA", linkfct="log")summary(mod5)#***# Model 6: RRUM modelmod6 <- CDM::gdina( data=dat, q.matrix=Q, rule="RRUM")summary(mod6)#***# Model 7: Higher order GDINA modelmod7 <- CDM::gdina( data=dat, q.matrix=Q, HOGDINA=1)summary(mod7)#***# Model 8: GDINA model with independent attributesmod8 <- CDM::gdina( data=dat, q.matrix=Q, HOGDINA=0)summary(mod8)#***# Model 9: Estimating the GDINA model with monotonicity constraintsmod9 <- CDM::gdina( data=dat, q.matrix=Q, rule="GDINA", mono.constr=TRUE, linkfct="logit")summary(mod9)#***# Model 10: Estimating the ACDM model with SCAD penalty and regularization# parameter of .05mod10 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM", linkfct="logit", regular_type="scad", regular_lam=.05)summary(mod10)#***# Model 11: Estimation of GDINA model with prior distributions# N(0,10^2) prior for item interceptsprior_intercepts <- c(0,10)# N(0,1^2) prior for item slopesprior_slopes <- c(0,1)# estimate modelmod11 <- CDM::gdina( data=dat, q.matrix=Q, rule="GDINA", prior_intercepts=prior_intercepts, prior_slopes=prior_slopes)summary(mod11)############################################################################## EXAMPLE 2: Simulated DINO data# additive cognitive diagnosis model with different link functions#############################################################################data(sim.dino, package="CDM")data(sim.matrix, package="CDM")dat <- sim.dino
Q <- sim.qmatrix
#***# Model 1: additive cognitive diagnosis model (ACDM; identity link)mod1 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM")summary(mod1)#***# Model 2: ACDM logit linkmod2 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM", linkfct="logit")summary(mod2)#***# Model 3: ACDM log linkmod3 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM", linkfct="log")summary(mod3)#***# Model 4: Different condensation rules per itemI <-9# number of itemsrule <- rep("GDINA", I )rule[1]<-"DINO"# 1st item: DINO modelrule[7]<-"GDINA2"# 7th item: GDINA model with first- and second-order interactionsrule[8]<-"ACDM"# 8ht item: additive CDMrule[9]<-"DINA"# 9th item: DINA modelmod4 <- CDM::gdina( data=dat, q.matrix=Q, rule=rule )summary(mod4)############################################################################## EXAMPLE 3: Model with user-specified design matrices#############################################################################data(sim.dino, package="CDM")data(sim.qmatrix, package="CDM")dat <- sim.dino
Q <- sim.qmatrix
# do a preliminary analysis and modify obtained design matricesmod0 <- CDM::gdina( data=dat, q.matrix=Q, maxit=1)# extract default design matricesMj <- mod0$Mj
Mj.user <- Mj # these user defined design matrices are modified.#~~~ For the second item, the following model should hold# X1 ~ V2 + V2*V3mj <- Mj[[2]][[1]]mj.lab <- Mj[[2]][[2]]mj <- mj[,-3]mj.lab <- mj.lab[-3]Mj.user[[2]]<- list( mj, mj.lab )# [[1]]# [,1] [,2] [,3]# [1,] 1 0 0# [2,] 1 1 0# [3,] 1 0 0# [4,] 1 1 1# [[2]]# [1] "0" "1" "1-2"#~~~ For the eight item an equality constraint should hold# X8 ~ a*V2 + a*V3 + V2*V3mj <- Mj[[8]][[1]]mj.lab <- Mj[[8]][[2]]mj[,2]<- mj[,2]+ mj[,3]mj <- mj[,-3]mj.lab <- c("0","1=2","1-2")Mj.user[[8]]<- list( mj, mj.lab )Mj.user[[8]]## [[1]]## [,1] [,2] [,3]## [1,] 1 0 0## [2,] 1 1 0## [3,] 1 1 0## [4,] 1 2 1#### [[2]]## [1] "0" "1=2" "1-2"mod <- CDM::gdina( data=dat, q.matrix=Q, Mj=Mj.user, maxit=200)summary(mod)############################################################################## EXAMPLE 4: Design matrix for delta parameters#############################################################################data(sim.dino, package="CDM")data(sim.qmatrix, package="CDM")#~~~ estimate an initial modelmod0 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM", maxit=1)# extract coefficientsc0 <- mod0$coef
I <-9# number of itemsdelta.designmatrix <- matrix(0, nrow=nrow(c0), ncol=nrow(c0))diag( delta.designmatrix)<-1# set intercept of item 1 and item 3 equal to each otherdelta.designmatrix[7,1]<-1; delta.designmatrix[,7]<-0# set loading of V1 of item1 and item 3 equaldelta.designmatrix[8,2]<-1; delta.designmatrix[,8]<-0delta.designmatrix <- delta.designmatrix[,-c(7:8)]# exclude original parameters with indices 7 and 8#***# Model 1: ACDM with designmatrixmod1 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM", delta.designmatrix=delta.designmatrix )summary(mod1)#***# Model 2: Same model, but with logit link instead of identity link functionmod2 <- CDM::gdina( data=dat, q.matrix=Q, rule="ACDM", delta.designmatrix=delta.designmatrix, linkfct="logit")summary(mod2)############################################################################## EXAMPLE 5: Multiple group estimation############################################################################## simulate dataset.seed(9279)N1 <-200; N2 <-100# group sizesI <-10# number of itemsq.matrix <- matrix(0,I,2)# create Q-matrixq.matrix[1:7,1]<-1; q.matrix[5:10,2]<-1# simulate first groupdat1 <- CDM::sim.din(N1, q.matrix=q.matrix, mean=c(0,0))$dat
# simulate second groupdat2 <- CDM::sim.din(N2, q.matrix=q.matrix, mean=c(-.3,-.7))$dat
# merge datadat <- rbind( dat1, dat2 )# group indicatorgroup <- c( rep(1,N1), rep(2,N2))# estimate GDINA model with multiple groups assuming invariant item parametersmod1 <- CDM::gdina( data=dat, q.matrix=q.matrix, group=group)summary(mod1)# estimate DINA model with multiple groups assuming invariant item parametersmod2 <- CDM::gdina( data=dat, q.matrix=q.matrix, group=group, rule="DINA")summary(mod2)# estimate GDINA model with noninvariant item parametersmod3 <- CDM::gdina( data=dat, q.matrix=q.matrix, group=group, invariance=FALSE)summary(mod3)# estimate GDINA model with some invariant item parameters (I001, I006, I008)mod4 <- CDM::gdina( data=dat, q.matrix=q.matrix, group=group, invariance=c("I001","I006","I008"))#--- model comparisonIRT.compareModels(mod1,mod2,mod3,mod4)# estimate GDINA model with non-invariant item parameters except for the# items I001, I006, I008mod5 <- CDM::gdina( data=dat, q.matrix=q.matrix, group=group, invariance=setdiff( colnames(dat), c("I001","I006","I008")))############################################################################## EXAMPLE 6: User specified reduced skill space############################################################################## Some correlations between attributes should be set to zero.q.matrix <- expand.grid( c(0,1), c(0,1), c(0,1), c(0,1))colnames(q.matrix)<- colnames( paste("Attr",1:4,sep=""))q.matrix <- q.matrix[-1,]Sigma <- matrix(.5, nrow=4, ncol=4)diag(Sigma)<-1Sigma[3,2]<- Sigma[2,3]<-0# set correlation of attribute A2 and A3 to zerodat <- CDM::sim.din( N=1000, q.matrix=q.matrix, Sigma=Sigma)$dat
#~~~ Step 1: initial estimationmod1a <- CDM::gdina( data=dat, q.matrix=q.matrix, maxit=1, rule="DINA")# estimate also "full" modelmod1 <- CDM::gdina( data=dat, q.matrix=q.matrix, rule="DINA")#~~~ Step 2: modify designmatrix for reduced skillspaceZ.skillspace <- data.frame( mod1a$Z.skillspace )# set correlations of A2/A4 and A3/A4 to zerovars <- c("A2_A3","A2_A4")for(vv in vars){ Z.skillspace[,vv]<-NULL}#~~~ Step 3: estimate model with reduced skillspacemod2 <- CDM::gdina( data=dat, q.matrix=q.matrix, Z.skillspace=Z.skillspace, rule="DINA")#~~~ eliminate all covariancesZ.skillspace <- data.frame( mod1$Z.skillspace )colnames(Z.skillspace)Z.skillspace <- Z.skillspace[,-grep("_", colnames(Z.skillspace),fixed=TRUE)]colnames(Z.skillspace)mod3 <- CDM::gdina( data=dat, q.matrix=q.matrix, Z.skillspace=Z.skillspace, rule="DINA")summary(mod1)summary(mod2)summary(mod3)############################################################################## EXAMPLE 7: Polytomous GDINA model (Chen & de la Torre, 2013)#############################################################################data(data.pgdina, package="CDM")dat <- data.pgdina$dat
q.matrix <- data.pgdina$q.matrix
# pGDINA model with "DINA rule"mod1 <- CDM::gdina( dat, q.matrix=q.matrix, rule="DINA")summary(mod1)# no reduced skill spacemod1a <- CDM::gdina( dat, q.matrix=q.matrix, rule="DINA",reduced.skillspace=FALSE)summary(mod1)# pGDINA model with "GDINA rule"mod2 <- CDM::gdina( dat, q.matrix=q.matrix, rule="GDINA")summary(mod2)############################################################################## EXAMPLE 8: Fraction subtraction data: DINA and HO-DINA model#############################################################################data(fraction.subtraction.data, package="CDM")data(fraction.subtraction.qmatrix, package="CDM")dat <- fraction.subtraction.data
Q <- fraction.subtraction.qmatrix
# Model 1: DINA modelmod1 <- CDM::gdina( dat, q.matrix=Q, rule="DINA")summary(mod1)# Model 2: HO-DINA modelmod2 <- CDM::gdina( dat, q.matrix=Q, HOGDINA=1, rule="DINA")summary(mod2)############################################################################## EXAMPLE 9: Skill space approximation data.jang#############################################################################data(data.jang, package="CDM")data <- data.jang$data
q.matrix <- data.jang$q.matrix
#*** Model 1: Reduced RUM modelmod1 <- CDM::gdina( data, q.matrix, rule="RRUM", conv.crit=.001, maxit=500)#*** Model 2: Reduced RUM model with skill space approximation# use 300 instead of 2^9=512 skill classesskillspace <- CDM::skillspace.approximation( L=300, K=ncol(q.matrix))mod2 <- CDM::gdina( data, q.matrix, rule="RRUM", conv.crit=.001, skillclasses=skillspace )## > logLik(mod1)## 'log Lik.' -30318.08 (df=153)## > logLik(mod2)## 'log Lik.' -30326.52 (df=153)############################################################################## EXAMPLE 10: CDM with a linear hierarchy############################################################################## This model is equivalent to a unidimensional IRT model with an ordered# ordinal latent trait and is actually a probabilistic Guttman model.set.seed(789)# define 3 competency levelsalpha <- scan()000100110111# define skill class distributionK <-3skillspace <- alpha <- matrix( alpha, K +1, K, byrow=TRUE)alpha <- alpha[ rep(1:4, c(300,300,200,200)),]# P(000)=P(100)=.3, P(110)=P(111)=.2# define Q-matrixQ <- scan()100110111Q <- matrix( Q, nrow=K, ncol=K, byrow=TRUE)Q <- Q[ rep(1:K, each=4),]colnames(skillspace)<- colnames(Q)<- paste0("A",1:K)I <- nrow(Q)# define guessing and slipping parametersguess <- stats::runif( I,0,.3)slip <- stats::runif( I,0,.2)# simulate datadat <- CDM::sim.din( q.matrix=Q, alpha=alpha, slip=slip, guess=guess )$dat
#*** Model 1: DINA model with linear hierarchymod1 <- CDM::din( dat, q.matrix=Q, rule="DINA", skillclasses=skillspace )summary(mod1)#*** Model 2: pGDINA model with 3 levels# The multidimensional CDM with a linear hierarchy is a unidimensional# polytomous GDINA model.Q2 <- matrix( rowSums(Q), nrow=I, ncol=1)mod2 <- CDM::gdina( dat, q.matrix=Q2, rule="DINA")summary(mod2)#*** Model 3: estimate probabilistic Guttman model in sirt# Proctor, C. H. (1970). A probabilistic formulation and statistical# analysis for Guttman scaling. Psychometrika, 35, 73-78.library(sirt)mod3 <- sirt::prob.guttman( dat, itemlevel=Q2[,1])summary(mod3)# -> The three models result in nearly equivalent fit.############################################################################## EXAMPLE 11: Sequential GDINA model (Ma & de la Torre, 2016)#############################################################################data(data.cdm04, package="CDM")#** attach datasetdat <- data.cdm04$data # polytomous item responsesq.matrix1 <- data.cdm04$q.matrix1
q.matrix2 <- data.cdm04$q.matrix2
#-- DINA model with first Q-matrixmod1 <- CDM::gdina( dat, q.matrix=q.matrix1, rule="DINA")summary(mod1)#-- DINA model with second Q-matrixmod2 <- CDM::gdina( dat, q.matrix=q.matrix2, rule="DINA")#-- GDINA modelmod3 <- CDM::gdina( dat, q.matrix=q.matrix2, rule="GDINA")#** model comparisonIRT.compareModels(mod1,mod2,mod3)############################################################################## EXAMPLE 12: Simulataneous modeling of skills and misconceptions (Kuo et al., 2018)#############################################################################data(data.cdm08, package="CDM")dat <- data.cdm08$data
q.matrix <- data.cdm08$q.matrix
#*** estimate modelmod <- CDM::gdina( dat0, q.matrix, rule="SISM", bugs=colnames(q.matrix)[5:7])summary(mod)############################################################################## EXAMPLE 13: Regularized estimation in GDINA model data.dtmr#############################################################################data(data.dtmr, package="CDM")dat <- data.dtmr$data
q.matrix <- data.dtmr$q.matrix
#***** LASSO regularization with lambda parameter of .02mod1 <- CDM::gdina(dat, q.matrix=q.matrix, rule="GDINA", regular_lam=.02, regular_type="lasso")summary(mod1)mod$delta_regularized
#***** using starting values from previuos estimationdelta.init <- mod1$delta
attr.prob.init <- mod1$attr.prob
mod2 <- CDM::gdina(dat, q.matrix=q.matrix, rule="GDINA", regular_lam=.02, regular_type="lasso", delta.init=delta.init, attr.prob.init=attr.prob.init)summary(mod2)#***** final estimation fixing regularized estimates to zero and estimate all other#***** item parameters unregularizedregular_weights <- mod2$delta_regularized
delta.init <- mod2$delta
attr.prob.init <- mod2$attr.prob
mod3 <- CDM::gdina(dat, q.matrix=q.matrix, rule="GDINA", regular_lam=1E5, regular_type="lasso", delta.init=delta.init, attr.prob.init=attr.prob.init, regular_weights=regular_weights)summary(mod3)## End(Not run)