This function estimates the general diagnostic model (von Davier, 2008; Xu & von Davier, 2008) which handles multidimensional item response models with ordered discrete or continuous latent variables for polytomous item responses.
gdm( data, theta.k, irtmodel="2PL", group=NULL, weights=rep(1, nrow(data)), Qmatrix=NULL, thetaDes=NULL, skillspace="loglinear", b.constraint=NULL, a.constraint=NULL, mean.constraint=NULL, Sigma.constraint=NULL, delta.designmatrix=NULL, standardized.latent=FALSE, centered.latent=FALSE, centerintercepts=FALSE, centerslopes=FALSE, maxiter=1000, conv=1e-5, globconv=1e-5, msteps=4, convM=.0005, decrease.increments=FALSE, use.freqpatt=FALSE, progress=TRUE, PEM=FALSE, PEM_itermax=maxiter,...)## S3 method for class 'gdm'summary(object, file=NULL,...)## S3 method for class 'gdm'print(x,...)## S3 method for class 'gdm'plot(x, perstype="EAP", group=1, barwidth=.1, histcol=1, cexcor=3, pchpers=16, cexpers=.7,...)
Arguments
data: An N×I matrix of polytomous item responses with categories k=0,1,...,K
theta.k: In the one-dimensional case it must be a vector. For multidimensional models it has to be a list of skill vectors if the theta grid differs between dimensions. If not, a vector input can be supplied. If an estimated skillspace (skillspace="est" should be estimated, a vector or a matrix theta.k will be used as initial values of the estimated θ grid.
irtmodel: The default 2PL corresponds to the model where item slopes on dimensions are equal for all item categories. If item-category slopes should be estimated, use 2PLcat. If no item slopes should be estimated then 1PL can be selected. Note that fixed item slopes can be specified in the Q-matrix (argument Qmatrix).
group: An optional vector of group identifiers for multiple group estimation. For plot.gdm it is an integer indicating which group should be used for plotting.
weights: An optional vector of sample weights
Qmatrix: An optional array of dimension I×D×K
which indicates pre-specified item loadings on dimensions. The default for category k is the score k, i.e. the scoring in the (generalized) partial credit model.
thetaDes: A design matrix for specifying nonlinear item response functions (see Example 1, Models 4 and 5)
skillspace: The parametric assumption of the skillspace. If skillspace="normal" then a univariate or multivariate normal distribution is assumed. The default "loglinear" corresponds to log-linear smoothing of the skillspace distribution (Xu & von Davier, 2008). If skillspace="full", then all probabilities of the skill space are nonparametrically estimated. If skillspace="est", then the θ distribution vectors will be estimated (see Details and Examples 4 and 5; Bartolucci, 2007).
b.constraint: In this optional matrix with Cb rows and three columns, Cb item intercepts bik can be fixed. 1st column: item index, 2nd column: category index, 3rd column: fixed item thresholds
a.constraint: In this optional matrix with Ca rows and four columns, Ca item intercepts aidk can be fixed. 1st column: item index, 2nd column: dimension index, 3rd column: category index, 4th column: fixed item slopes
mean.constraint: A C×3 matrix for constraining C means in the normal distribution assumption (skillspace="normal"). 1st column: Dimension, 2nd column: Group, 3rd column: Value
Sigma.constraint: A C×4 matrix for constraining C covariances in the normal distribution assumption (skillspace="normal"). 1st column: Dimension 1, 2nd column: Dimension 2, 3rd column: Group, 4th column: Value
delta.designmatrix: The design matrix of δ parameters for the reduced skillspace estimation (see Xu & von Davier, 2008)
standardized.latent: A logical indicating whether in a uni- or multidimensional model all latent variables of the first group should be normally distributed and standardized. The default is FALSE.
centered.latent: A logical indicating whether in a uni- or multidimensional model all latent variables of the first group should be normally distributed and do have zero means? The default is FALSE.
centerintercepts: A logical indicating whether intercepts should be centered to have a mean of 0 for all dimensions. This argument does not (yet) work properly for varying numbers of item categories.
centerslopes: A logical indicating whether item slopes should be centered to have a mean of 1 for all dimensions. This argument only works for irtmodel="2PL". The default is FALSE.
maxiter: Maximum number of iterations
conv: Convergence criterion for item parameters and distribution parameters
globconv: Global deviance convergence criterion
msteps: Maximum number of M steps in estimating b and a item parameters. The default is to use 4 M steps.
convM: Convergence criterion in M step
decrease.increments: Should in the M step the increments of a and b parameters decrease during iterations? The default is FALSE. If there is an increase in deviance during estimation, setting decrease.increments to TRUE
is recommended.
use.freqpatt: A logical indicating whether frequencies of unique item response patterns should be used. In case of large data set use.freqpatt=TRUE
can speed calculations (depending on the problem). Note that in this case, not all person parameters are calculated as usual in the output.
progress: An optional logical indicating whether the function should print the progress of iteration in the estimation process.
PEM: Logical indicating whether the P-EM acceleration should be applied (Berlinet & Roland, 2012).
PEM_itermax: Number of iterations in which the P-EM method should be applied.
object: A required object of class gdm
file: Optional file name for a file in which summary
should be sinked.
x: A required object of class gdm
perstype: Person parameter estimate type. Can be either "EAP", "MAP" or "MLE".
barwidth: Bar width in plot.gdm
histcol: Color of histogram bars in plot.gdm
cexcor: Font size for print of correlation in plot.gdm
pchpers: Point type for scatter plot of person parameters in plot.gdm
cexpers: Point size for scatter plot of person parameters in plot.gdm
...: Optional parameters to be passed to or from other methods will be ignored.
Details
Case irtmodel="1PL":
Equal item slopes of 1 are assumed in this model. Therefore, it corresponds to a generalized multidimensional Rasch model.
logitP(Xnj=k∣θn)=bj0+d∑qjdkθnd
The Q-matrix entries qjdk are pre-specified by the user.
Case irtmodel="2PL":
For each item and each dimension, different item slopes ajd
are estimated:
logitP(Xnj=k∣θn)=bj0+d∑ajdqjdkθnd
Case irtmodel="2PLcat":
For each item, each dimension and each category, different item slopes ajdk
are estimated:
logitP(Xnj=k∣θn)=bj0+d∑ajdkqjdkθnd
Note that this model can be generalized to include terms of any transformation th of the θn vector (e.g. quadratic terms, step functions or interaction) such that the model can be formulated as
logitP(Xnj=k∣θn)=bj0+h∑ajhkqjhkth(θn)
In general, the number of functions t1,...,tH will be larger than the θ dimension of D.
The estimation follows an EM algorithm as described in von Davier and Yamamoto (2004) and von Davier (2008).
In case of skillspace="est", the θ vectors (the grid of the theta distribution) are estimated (Bartolucci, 2007; Bacci, Bartolucci & Gnaldi, 2012). This model is called a multidimensional latent class item response model.
Returns
An object of class gdm. The list contains the following entries: - item: Data frame with item parameters
person: Data frame with person parameters: EAP denotes the mean of the individual posterior distribution, SE.EAP the corresponding standard error, MLE the maximum likelihood estimate at theta.k
and MAP the mode of the posterior distribution
EAP.rel: Reliability of the EAP
deviance: Deviance
ic: Information criteria, number of estimated parameters
b: Item intercepts bjk
se.b: Standard error of item intercepts bjk
a: Item slopes ajd resp. ajdk
se.a: Standard error of item slopes ajd resp. ajdk
itemfit.rmsea: The RMSEA item fit index (see itemfit.rmsea). This entry comes as a list with total and group-wise item fit statistics.
mean.rmsea: Mean of RMSEA item fit indexes.
Qmatrix: Used Q-matrix
pi.k: Trait distribution
mean.trait: Means of trait distribution
sd.trait: Standard deviations of trait distribution
skewness.trait: Skewnesses of trait distribution
correlation.trait: List of correlation matrices of trait distribution corresponding to each group
pjk: Item response probabilities evaluated at grid theta.k
n.ik: An array of expected counts ncikg of ability class c
at item i at category k in group g
G: Number of groups
D: Number of dimension of θ
I: Number of items
N: Number of persons
delta: Parameter estimates for skillspace representation
covdelta: Covariance matrix of parameter estimates for skillspace representation
data: Original data frame
group.stat: Group statistics (sample sizes, group labels)
p.xi.aj: Individual likelihood
posterior: Individual posterior distribution
skill.levels: Number of skill levels per dimension
K.item: Maximal category per item
theta.k: Used theta design or estimated theta trait distribution in case of skillspace="est"
thetaDes: Used theta design for item responses
se.theta.k: Estimated standard errors of theta.k if it is estimated
time: Info about computation time
skillspace: Used skillspace parametrization
iter: Number of iterations
converged: Logical indicating whether convergence was achieved.
object: Object of class gdm
x: Object of class gdm
perstype: Person paramter estimate type. Can be either "EAP", "MAP" or "MLE".
group: Group which should be used for plot.gdm
barwidth: Bar width in plot.gdm
histcol: Color of histogram bars in plot.gdm
cexcor: Font size for print of correlation in plot.gdm
pchpers: Point type for scatter plot of person parameters in plot.gdm
cexpers: Point size for scatter plot of person parameters in plot.gdm
...: Optional parameters to be passed to or from other methods will be ignored.
References
Bacci, S., Bartolucci, F., & Gnaldi, M. (2012). A class of multidimensional latent class IRT models for ordinal polytomous item responses. arXiv preprint, arXiv:1201.4667.
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141-157.
Berlinet, A. F., & Roland, C. (2012). Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Computational Statistics & Data Analysis, 56 (12), 4122-4137.
von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287-307.
von Davier, M., & Yamamoto, K. (2004). Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Applied Psychological Measurement, 28, 389-406.
Xu, X., & von Davier, M. (2008). Fitting the structured general diagnostic model to NAEP data. ETS Research Report ETS RR-08-27. Princeton, ETS.
See Also
Cognitive diagnostic models for dichotomous data can be estimated with din (DINA or DINO model) or gdina
(GDINA model, which contains many CDMs as special cases).
For assessment of model fit see modelfit.cor.din and anova.gdm.
See itemfit.sx2 for item fit statistics.
For the estimation of the multidimensional latent class item response model see the MultiLCIRT package and sirt package (function sirt::rasch.mirtlc).
Examples
############################################################################## EXAMPLE 1: Fraction Dataset 1# Unidimensional Models for dichotomous data#############################################################################data(data.fraction1, package="CDM")dat <- data.fraction1$data
theta.k <- seq(-6,6, len=15)# discretized ability#***# Model 1: Rasch model (normal distribution)mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", centered.latent=TRUE)summary(mod1)plot(mod1)#***# Model 2: Rasch model (log-linear smoothing)# set the item difficulty of the 8th item to zerob.constraint <- matrix( c(8,1,0),1,3)mod2 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="loglinear", b.constraint=b.constraint )summary(mod2)#***# Model 3: 2PL modelmod3 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="normal", standardized.latent=TRUE)summary(mod3)## Not run:#***# Model 4: include quadratic term in item response function# using the argument decrease.increments=TRUE leads to a more# stable estimatethetaDes <- cbind( theta.k, theta.k^2)colnames(thetaDes)<- c("F1","F1q")mod4 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, thetaDes=thetaDes, skillspace="normal", standardized.latent=TRUE, decrease.increments=TRUE)summary(mod4)#***# Model 5: step function for ICC# two different probabilities theta < 0 and theta > 0thetaDes <- matrix(1*(theta.k>0), ncol=1)colnames(thetaDes)<- c("Fgrm1")mod5 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, thetaDes=thetaDes, skillspace="normal")summary(mod5)#***# Model 6: DINA model with din functionmod6 <- CDM::din( dat, q.matrix=matrix(1, nrow=ncol(dat),ncol=1))summary(mod6)#***# Model 7: Estimating a version of the DINA model with gdmtheta.k <- c(-.5,.5)mod7 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="loglinear")summary(mod7)############################################################################## EXAMPLE 2: Cultural Activities - data.Students# Unidimensional Models for polytomous data#############################################################################data(data.Students, package="CDM")dat <- data.Students
dat <- dat[, grep("act", colnames(dat))]theta.k <- seq(-4,4, len=11)# discretized ability#***# Model 1: Partial Credit Model (PCM)mod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", centered.latent=TRUE)summary(mod1)plot(mod1)#***# Model 1b: PCM using frequency patternsmod1b <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="normal", centered.latent=TRUE, use.freqpatt=TRUE)summary(mod1b)#***# Model 2: PCM with two groupsmod2 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, group=CDM::data.Students$urban +1, skillspace="normal", centered.latent=TRUE)summary(mod2)#***# Model 3: PCM with loglinear smoothingb.constraint <- matrix( c(1,2,0), ncol=3)mod3 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, skillspace="loglinear", b.constraint=b.constraint )summary(mod3)#***# Model 4: Model with pre-specified item weights in Q-matrixQmatrix <- array(1, dim=c(5,1,2))Qmatrix[,1,2]<-2# default is score 2 for category 2# now change the scoring of category 2:Qmatrix[c(2,4),1,1]<-.74Qmatrix[c(2,4),1,2]<-2.3# for items 2 and 4 the score for category 1 is .74 and for category 2 it is 2.3mod4 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, skillspace="normal", centered.latent=TRUE)summary(mod4)#***# Model 5: Generalized partial credit modelmod5 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, skillspace="normal", standardized.latent=TRUE)summary(mod5)#***# Model 6: Item-category slope estimationmod6 <- CDM::gdm( dat, irtmodel="2PLcat", theta.k=theta.k, skillspace="normal", standardized.latent=TRUE, decrease.increments=TRUE)summary(mod6)#***# Models 7: items with different number of categoriesdat0 <- dat
dat0[ paste(dat0[,1])==2,1]<-1# 1st item has only two categoriesdat0[ paste(dat0[,3])==2,3]<-1# 3rd item has only two categories# Model 7a: PCMmod7a <- CDM::gdm( dat0, irtmodel="1PL", theta.k=theta.k, centered.latent=TRUE)summary(mod7a)# Model 7b: Item category slopesmod7b <- CDM::gdm( dat0, irtmodel="2PLcat", theta.k=theta.k, standardized.latent=TRUE, decrease.increments=TRUE)summary(mod7b)############################################################################## EXAMPLE 3: Fraction Dataset 2# Multidimensional Models for dichotomous data#############################################################################data(data.fraction2, package="CDM")dat <- data.fraction2$data
Qmatrix <- data.fraction2$q.matrix3
#***# Model 1: One-dimensional Rasch modeltheta.k <- seq(-4,4, len=11)# discretized abilitymod1 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, centered.latent=TRUE)summary(mod1)plot(mod1)#***# Model 2: One-dimensional 2PL modelmod2 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, standardized.latent=TRUE)summary(mod2)plot(mod2)#***# Model 3: 3-dimensional Rasch Model (normal distribution)mod3 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, centered.latent=TRUE, globconv=5*1E-3, conv=1E-4)summary(mod3)#***# Model 4: 3-dimensional Rasch model (loglinear smoothing)# set some item parameters of items 4,1 and 2 to zerob.constraint <- cbind( c(4,1,2),1,0)mod4 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, b.constraint=b.constraint, skillspace="loglinear")summary(mod4)#***# Model 5: define a different theta grid for each dimensiontheta.k <- list("Dim1"=seq(-5,5, len=11),"Dim2"=seq(-5,5,len=8),"Dim3"=seq(-3,3,len=6))mod5 <- CDM::gdm( dat, irtmodel="1PL", theta.k=theta.k, Qmatrix=Qmatrix, b.constraint=b.constraint, skillspace="loglinear")summary(mod5)#***# Model 6: multdimensional 2PL model (normal distribution)theta.k <- seq(-5,5, len=13)a.constraint <- cbind( c(8,1,3),1:3,1,1)# fix some slopes to 1mod6 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, Qmatrix=Qmatrix, centered.latent=TRUE, a.constraint=a.constraint, decrease.increments=TRUE, skillspace="normal")summary(mod6)#***# Model 7: multdimensional 2PL model (loglinear distribution)a.constraint <- cbind( c(8,1,3),1:3,1,1)b.constraint <- cbind( c(8,1,3),1,0)mod7 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, Qmatrix=Qmatrix, b.constraint=b.constraint, a.constraint=a.constraint, decrease.increments=FALSE, skillspace="loglinear")summary(mod7)############################################################################## EXAMPLE 4: Unidimensional latent class 1PL IRT model############################################################################## simulate dataset.seed(754)I <-20# number of itemsN <-2000# number of personstheta <- c(-2,0,1,2)theta <- rep( theta, c(N/4,N/4,3*N/8, N/8))b <- seq(-2,2,len=I)library(sirt)# use function sim.raschtype from sirt packagedat <- sirt::sim.raschtype( theta=theta, b=b )theta.k <- seq(-1,1, len=4)# initial vector of theta# estimate modelmod1 <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL", centerintercepts=TRUE, maxiter=200)summary(mod1)## Estimated Skill Distribution## F1 pi.k## 1 -1.988 0.24813## 2 -0.055 0.23313## 3 0.940 0.40059## 4 2.000 0.11816############################################################################## EXAMPLE 5: Multidimensional latent class IRT model############################################################################## We simulate a two-dimensional IRT model in which theta vectors# are observed at a fixed discrete grid (see below).# simulate dataset.seed(754)I <-13# number of itemsN <-2400# number of persons# simulate Dimension 1 at 4 discrete theta pointstheta <- c(-2,0,1,2)theta <- rep( theta, c(N/4,N/4,3*N/8, N/8))b <- seq(-2,2,len=I)library(sirt)# use simulation function from sirt packagedat1 <- sirt::sim.raschtype( theta=theta, b=b )# simulate Dimension 2 at 4 discrete theta pointstheta <- c(-3,0,1.5,2)theta <- rep( theta, c(N/4,N/4,3*N/8, N/8))dat2 <- sirt::sim.raschtype( theta=theta, b=b )colnames(dat2)<- gsub("I","U", colnames(dat2))dat <- cbind( dat1, dat2 )# define Q-matrixQmatrix <- matrix(0,2*I,2)Qmatrix[ cbind(1:(2*I), rep(1:2, each=I))]<-1theta.k <- seq(-1,1, len=4)# initial matrixtheta.k <- cbind( theta.k, theta.k )colnames(theta.k)<- c("Dim1","Dim2")# estimate modelmod2 <- CDM::gdm( dat, theta.k=theta.k, skillspace="est", irtmodel="1PL", Qmatrix=Qmatrix, centerintercepts=TRUE)summary(mod2)## Estimated Skill Distribution## theta.k.Dim1 theta.k.Dim2 pi.k## 1 -2.022 -3.035 0.25010## 2 0.016 0.053 0.24794## 3 0.956 1.525 0.36401## 4 1.958 1.919 0.13795############################################################################## EXAMPLE 6: Large-scale dataset data.mg#############################################################################data(data.mg, package="CDM")dat <- data.mg[, paste0("I",1:11)]theta.k <- seq(-6,6,len=21)#***# Model 1: Generalized partial credit model with multiple groupsmod1 <- CDM::gdm( dat, irtmodel="2PL", theta.k=theta.k, group=CDM::data.mg$group, skillspace="normal", standardized.latent=TRUE)summary(mod1)## End(Not run)