Converting a data matrix from one format into another
Converting a data matrix from one format into another
convert recodes a data matrix from one format, used by versions of correspondence analysis, into another (n objects by p variables, counts for distinct combinations of p variables, indicator matrix, contingency table).
Xinput: A data matrix, in the form of a data frame or similar
input: The format of the input matrix:
"nbyp": An n individuals/objects/data points by p categorical variables matrix, where each row is a different data point and each column contains the category for that data point on that variable, where these categories can be numbers, strings or factors
"nbypcounts": Similar to the above, but each row represents all of the data points taking the same combination of categories, and the first column contains the count for this combination (hence the name used here is a bit of a misnomer, but it emphasises the similarities to an n by p)
"indicator": An indicator matrix, similar to the n by p matrix except that a variable with J_k categories is represented by J_k columns and a data point taking the i-th category has 1 in the i-th of these columns and a zero in the others
"CT": A contingency table of counts
output: The format of the output matrix:
"nbyp": As above
"nbypcounts": As above
"indicator": As above
"doubled": Similar to indicator but each variable is now represented by 2 columns, and a data point taking the i-th category for a variable with J_k categories is given the values J_k-i in the first (low) column and i-1 in the second (high) column
Jk: A list containing the number of distinct categories for each variable.
Either Jk or maxcat must be specified if input is "indicator"
maxcat: The maximum category value, for use when all variables are Likert on a scale of 1 to maxcat.
Either Jk or maxcat must be specified if input is "indicator"
varandcat: Flag for how to construct column names in an indicator matrix:
TRUE: if many variables have the same categories, e.g. Likert, column names will be varname:catname
FALSE: when variables have distinct categories, column names will just be category names
Returns
A list containing:
result: the output data matrix formatted according to the output argument
varnames: a list of length p containing the names of each variable
catnames: a list/array (of length p) containing the lists (of length Jk[i]) of category names for each variable
Jk: a list of length p containing the number of distinct categories for each variable