Transformations function

Functions for Data Transformation

Functions for Data Transformation

Transformations for factors and numeric variables.

id_trafo(x) rank_trafo(x, ties.method = c("mid-ranks", "random")) normal_trafo(x, ties.method = c("mid-ranks", "average-scores")) median_trafo(x, mid.score = c("0", "0.5", "1")) savage_trafo(x, ties.method = c("mid-ranks", "average-scores")) consal_trafo(x, ties.method = c("mid-ranks", "average-scores"), a = 5) koziol_trafo(x, ties.method = c("mid-ranks", "average-scores"), j = 1) klotz_trafo(x, ties.method = c("mid-ranks", "average-scores")) mood_trafo(x, ties.method = c("mid-ranks", "average-scores")) ansari_trafo(x, ties.method = c("mid-ranks", "average-scores")) fligner_trafo(x, ties.method = c("mid-ranks", "average-scores")) logrank_trafo(x, ties.method = c("mid-ranks", "Hothorn-Lausen", "average-scores"), weight = logrank_weight, ...) logrank_weight(time, n.risk, n.event, type = c("logrank", "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "Prentice", "Prentice-Marek", "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington", "Gaugler-Kim-Liao", "Self"), rho = NULL, gamma = NULL) f_trafo(x) of_trafo(x, scores = NULL) zheng_trafo(x, increment = 0.1) maxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob) fmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob) ofmaxstat_trafo(x, minprob = 0.1, maxprob = 1 - minprob) trafo(data, numeric_trafo = id_trafo, factor_trafo = f_trafo, ordered_trafo = of_trafo, surv_trafo = logrank_trafo, var_trafo = NULL, block = NULL) mcp_trafo(...)

Arguments

  • x: an object of class "numeric", "factor", "ordered" or "Surv".

  • ties.method: a character, the method used to handle ties. The score generating function either uses the mid-ranks ("mid-ranks", default) or, in the case of rank_trafo(), randomly broken ties ("random"). Alternatively, the average of the scores resulting from applying the score generating function to randomly broken ties are used ("average-scores"). See logrank_test() for a detailed description of the methods used in logrank_trafo().

  • mid.score: a character, the score assigned to observations exactly equal to the median: either 0 ("0", default), 0.5 ("0.5") or 1 ("1"); see median_test().

  • a: a numeric vector, the values taken as the constant aa in the Conover-Salsburg scores. Defaults to 5.

  • j: a numeric, the value taken as the constant jj in the Koziol-Nemec scores. Defaults to 1.

  • weight: a function where the first three arguments must correspond to time, n.risk, and n.event given below. Defaults to logrank_weight.

  • time: a numeric vector, the ordered distinct time points.

  • n.risk: a numeric vector, the number of subjects at risk at each time point specified in time.

  • n.event: a numeric vector, the number of events at each time point specified in time.

  • type: a character, one of "logrank" (default), "Gehan-Breslow", "Tarone-Ware", "Peto-Peto", "Prentice", "Prentice-Marek", "Andersen-Borgan-Gill-Keiding", "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test().

  • rho: a numeric vector, the ρ\rho constant when type is "Tarone-Ware", "Fleming-Harrington", "Gaugler-Kim-Liao"

    or "Self"; see logrank_test(). Defaults to NULL, implying 0.5 for type = "Tarone-Ware" and 0 otherwise.

  • gamma: a numeric vector, the γ\gamma constant when type is "Fleming-Harrington", "Gaugler-Kim-Liao" or "Self"; see logrank_test(). Defaults to NULL, implying 0.

  • scores: a numeric vector or list, the scores corresponding to each level of an ordered factor. Defaults to NULL, implying 1:nlevels(x).

  • increment: a numeric, the score increment between the order-restricted sets of scores. A fraction greater than 0, but smaller than or equal to 1. Defaults to 0.1.

  • minprob: a numeric, a fraction between 0 and 0.5; see maxstat_test(). Defaults to 0.1.

  • maxprob: a numeric, a fraction between 0.5 and 1; see maxstat_test(). Defaults to 1 - minprob.

  • data: an object of class "data.frame".

  • numeric_trafo: a function to be applied to elements of class "numeric" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to id_trafo.

  • factor_trafo: a function to be applied to elements of class "factor" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to f_trafo.

  • ordered_trafo: a function to be applied to elements of class "ordered" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to of_trafo.

  • surv_trafo: a function to be applied to elements of class "Surv" in data, returning a matrix with nrow(data) rows and an arbitrary number of columns. Defaults to logrank_trafo.

  • var_trafo: an optional named list of functions to be applied to the corresponding variables in data. Defaults to NULL.

  • block: an optional factor whose levels are interpreted as blocks. trafo is applied to each level of block separately. Defaults to NULL.

  • ...: logrank_trafo(): further arguments to be passed to weight.

    mcp_trafo(): factor name and contrast matrix (as matrix or character) in a tag = value format for multiple comparisons based on a single unordered factor; see mcp() in package multcomp.

Details

The utility functions documented here are used to define specialized test procedures.

id_trafo() is the identity transformation.

rank_trafo(), normal_trafo(), median_trafo(), savage_trafo(), consal_trafo() and koziol_trafo() compute rank (Wilcoxon) scores, normal (van der Waerden) scores, median (Mood-Brown) scores, Savage scores, Conover-Salsburg scores (see neuropathy) and Koziol-Nemec scores, respectively, for location problems.

klotz_trafo(), mood_trafo(), ansari_trafo() and fligner_trafo() compute Klotz scores, Mood scores, Ansari-Bradley scores and Fligner-Killeen scores, respectively, for scale problems.

logrank_trafo() computes weighted logrank scores for right-censored data, allowing for a user-defined weight function through the weight

argument (see GTSG).

f_trafo() computes dummy matrices for factors and of_trafo()

assigns scores to ordered factors. For ordered factors with two levels, the scores are normalized to the [0,1][0, 1] range. zheng_trafo()

computes a finite collection of order-restricted scores for ordered factors (see jobsatisfaction, malformations and vision).

maxstat_trafo(), fmaxstat_trafo() and ofmaxstat_trafo()

compute scores for cutpoint problems (see maxstat_test()).

trafo() applies its arguments to the elements of data according to the classes of the elements. A trafo() function with modified default arguments is usually supplied to independence_test() via the xtrafo or ytrafo arguments. Fine tuning, i.e., different transformations for different variables, is possible by supplying a named list of functions to the var_trafo argument.

mcp_trafo() computes contrast matrices for factors.

Returns

A numeric vector or matrix with nrow(x) rows and an arbitrary number of columns. For trafo(), a named matrix with nrow(data) rows and an arbitrary number of columns.

Note

Starting with coin version 1.1-0, all transformation functions are now passing through missing values (i.e., NAs). Furthermore, median_trafo() and logrank_trafo() are now increasing

functions (in conformity with most other transformations in this package).

Examples

## Dummy matrix, two-sample problem (only one column) f_trafo(gl(2, 3)) ## Dummy matrix, K-sample problem (K columns) x <- gl(3, 2) f_trafo(x) ## Score matrix ox <- as.ordered(x) of_trafo(ox) of_trafo(ox, scores = c(1, 3:4)) of_trafo(ox, scores = list(s1 = 1:3, s2 = c(1, 3:4))) zheng_trafo(ox, increment = 1/3) ## Normal scores y <- runif(6) normal_trafo(y) ## All together now trafo(data.frame(x = x, ox = ox, y = y), numeric_trafo = normal_trafo) ## The same, but allows for fine-tuning trafo(data.frame(x = x, ox = ox, y = y), var_trafo = list(y = normal_trafo)) ## Transformations for maximally selected statistics maxstat_trafo(y) fmaxstat_trafo(x) ofmaxstat_trafo(ox) ## Apply transformation blockwise (as in the Friedman test) trafo(data.frame(y = 1:20), numeric_trafo = rank_trafo, block = gl(4, 5)) ## Multiple comparisons dta <- data.frame(x) mcp_trafo(x = "Tukey")(dta) ## The same, but useful when specific contrasts are desired K <- rbind("2 - 1" = c(-1, 1, 0), "3 - 1" = c(-1, 0, 1), "3 - 2" = c( 0, -1, 1)) mcp_trafo(x = K)(dta)