convert function

Converting a data matrix from one format into another

Converting a data matrix from one format into another

convert recodes a data matrix from one format, used by versions of correspondence analysis, into another (n objects by p variables, counts for distinct combinations of p variables, indicator matrix, contingency table).

convert( Xinput, input = "nbyp", output = "indicator", Jk = NULL, maxcat = NULL, varandcat = TRUE )

Arguments

  • Xinput: A data matrix, in the form of a data frame or similar

  • input: The format of the input matrix:

    • "nbyp": An n individuals/objects/data points by p categorical variables matrix, where each row is a different data point and each column contains the category for that data point on that variable, where these categories can be numbers, strings or factors
    • "nbypcounts": Similar to the above, but each row represents all of the data points taking the same combination of categories, and the first column contains the count for this combination (hence the name used here is a bit of a misnomer, but it emphasises the similarities to an n by p)
    • "indicator": An indicator matrix, similar to the n by p matrix except that a variable with J_k categories is represented by J_k columns and a data point taking the i-th category has 1 in the i-th of these columns and a zero in the others
    • "CT": A contingency table of counts
  • output: The format of the output matrix:

    • "nbyp": As above
    • "nbypcounts": As above
    • "indicator": As above
    • "doubled": Similar to indicator but each variable is now represented by 2 columns, and a data point taking the i-th category for a variable with J_k categories is given the values J_k-i in the first (low) column and i-1 in the second (high) column
  • Jk: A list containing the number of distinct categories for each variable.

    Either Jk or maxcat must be specified if input is "indicator"

  • maxcat: The maximum category value, for use when all variables are Likert on a scale of 1 to maxcat.

    Either Jk or maxcat must be specified if input is "indicator"

  • varandcat: Flag for how to construct column names in an indicator matrix:

    • TRUE: if many variables have the same categories, e.g. Likert, column names will be varname:catname
    • FALSE: when variables have distinct categories, column names will just be category names

Returns

A list containing:

  • result: the output data matrix formatted according to the output argument
  • varnames: a list of length p containing the names of each variable
  • catnames: a list/array (of length p) containing the lists (of length Jk[i]) of category names for each variable
  • Jk: a list of length p containing the number of distinct categories for each variable
  • p: the number of variables

Examples

dreamdataCT <- DreamData dreamdatanbyplist <- convert(dreamdataCT,input="CT",output="nbyp") dreamdatanbyp <- dreamdatanbyplist$result ## Not run: dreamdataCTb <- table(dreamdatanbyp) dreamdatanbypcounts <- convert(dreamdatanbyp,input="nbyp",output="nbypcounts")$result dreamdataindicatorlist <- convert(dreamdatanbypcounts,input="nbypcounts",output="indicator") dreamdatanbypb <- convert(dreamdataindicatorlist$result,input="indicator", output="nbyp",Jk=dreamdataindicatorlist$Jk)$result nishdatanbyp <- NishData nishdataindicator <- convert(nishdatanbyp)$result nishdataBurt <- t(nishdataindicator)%*%nishdataindicator ## End(Not run)

See Also

getBurt to obtain a Burt matrix or a subset of an existing one

getCT to obtain a contingency table (only if p=2)

getindicator to obtain an indicator matrix

getdoubled to obtain a doubled matrix if all variables are ordered categorical with numbered categories

Other conversion functions: getBurt(), getCT(), getdoubled(), getindicator()

  • Maintainer: Trevor Ringrose
  • License: GPL-3
  • Last published: 2022-03-02

Useful links