bystats function

Statistics by Categories

Statistics by Categories

For any number of cross-classification variables, bystats

returns a matrix with the sample size, number missing y, and fun(non-missing y), with the cross-classifications designated by rows. Uses Harrell's modification of the interaction

function to produce cross-classifications. The default fun is mean, and if y is binary, the mean is labeled as Fraction. There is a print method as well as a latex method for objects created by bystats. bystats2 handles the special case in which there are 2 classifcation variables, and places the first one in rows and the second in columns. The print method for bystats2 uses the print.char.matrix function to organize statistics for cells into boxes.

bystats(y, ..., fun, nmiss, subset) ## S3 method for class 'bystats' print(x, ...) ## S3 method for class 'bystats' latex(object, title, caption, rowlabel, ...) bystats2(y, v, h, fun, nmiss, subset) ## S3 method for class 'bystats2' print(x, abbreviate.dimnames=FALSE, prefix.width=max(nchar(dimnames(x)[[1]])), ...) ## S3 method for class 'bystats2' latex(object, title, caption, rowlabel, ...)

Arguments

  • y: a binary, logical, or continuous variable or a matrix or data frame of such variables. If y is a data frame it is converted to a matrix. If y is a data frame or matrix, computations are done on subsets of the rows of y, and you should specify fun so as to be able to operate on the matrix. For matrix y, any column with a missing value causes the entire row to be considered missing, and the row is not passed to fun.

  • ...: For bystats, one or more classifcation variables separated by commas. For print.bystats, options passed to print.default such as digits. For latex.bystats, and latex.bystats2, options passed to latex.default such as digits. If you pass cdec to latex.default, keep in mind that the first one or two positions (depending on nmiss) should have zeros since these correspond with frequency counts.

  • v: vertical variable for bystats2. Will be converted to factor.

  • h: horizontal variable for bystats2. Will be converted to factor.

  • fun: a function to compute on the non-missing y for a given subset. You must specify fun= in front of the function name or definition. fun may return a single number or a vector or matrix of any length. Matrix results are rolled out into a vector, with names preserved. When y is a matrix, a common fun is function(y) apply(y, 2, ff)

    where ff is the name of a function which operates on one column of y.

  • nmiss: A column containing a count of missing values is included if nmiss=TRUE

    or if there is at least one missing value.

  • subset: a vector of subscripts or logical values indicating the subset of data to analyze

  • abbreviate.dimnames: set to TRUE to abbreviate dimnames in output

  • prefix.width: see print.char.matrix

  • title: title to pass to latex.default. Default is the first word of the character string version of the first calling argument.

  • caption: caption to pass to latex.default. Default is the heading

    attribute from the object produced by bystats.

  • rowlabel: rowlabel to pass to latex.default. Default is the byvarnames

    attribute from the object produced by bystats. For bystats2 the default is "".

  • x: an object created by bystats or bystats2

  • object: an object created by bystats or bystats2

Returns

for bystats, a matrix with row names equal to the classification labels and column names N, Missing, funlab, where funlab is determined from fun. A row is added to the end with the summary statistics computed on all observations combined. The class of this matrix is bystats. For bystats, returns a 3-dimensional array with the last dimension corresponding to statistics being computed. The class of the array is bystats2.

Side Effects

latex produces a .tex file.

Author(s)

Frank Harrell

Department of Biostatistics

Vanderbilt University

fh@fharrell.com

See Also

interaction, cut, cut2, latex, print.char.matrix, translate

Examples

## Not run: bystats(sex==2, county, city) bystats(death, race) bystats(death, cut2(age,g=5), race) bystats(cholesterol, cut2(age,g=4), sex, fun=median) bystats(cholesterol, sex, fun=quantile) bystats(cholesterol, sex, fun=function(x)c(Mean=mean(x),Median=median(x))) latex(bystats(death,race,nmiss=FALSE,subset=sex=="female"), digits=2) f <- function(y) c(Hazard=sum(y[,2])/sum(y[,1])) # f() gets the hazard estimate for right-censored data from exponential dist. bystats(cbind(d.time, death), race, sex, fun=f) bystats(cbind(pressure, cholesterol), age.decile, fun=function(y) c(Median.pressure =median(y[,1]), Median.cholesterol=median(y[,2]))) y <- cbind(pressure, cholesterol) bystats(y, age.decile, fun=function(y) apply(y, 2, median)) # same result as last one bystats(y, age.decile, fun=function(y) apply(y, 2, quantile, c(.25,.75))) # The last one computes separately the 0.25 and 0.75 quantiles of 2 vars. latex(bystats2(death, race, sex, fun=table)) ## End(Not run)