rowGroupMeans() R function from [jamba]

Calculate row group means, or other statistics

Calculate row group means, or other statistics, where: rowGroupMeans()

calculates row summary stats; and rowGroupRmOutliers() is a convenience function to call rowGroupMeans(..., rmOutliers=TRUE, returnType="input").


rowGroupMeans(
  x,
  groups,
  na.rm = TRUE,
  useMedian = TRUE,
  rmOutliers = FALSE,
  crossGroupMad = TRUE,
  madFactor = 5,
  returnType = c("output", "input"),
  rowStatsFunc = NULL,
  groupOrder = c("same", "sort"),
  keepNULLlevels = FALSE,
  includeAttributes = FALSE,
  verbose = FALSE,
  ...
)

rowGroupRmOutliers(
  x,
  groups,
  na.rm = TRUE,
  rmOutliers = TRUE,
  crossGroupMad = TRUE,
  madFactor = 5,
  returnType = c("input"),
  groupOrder = c("same", "sort"),
  keepNULLlevels = FALSE,
  includeAttributes = FALSE,
  verbose = FALSE,
  ...
)

Arguments

x: numeric data matrix
groups: character or factor vector of group labels, either as a character vector, or a factor. See the parameter groupOrder for ordering of group labels in the output data matrix.
na.rm: logical, default TRUE, passed to the stats func to ignore NA values.
useMedian: logical, default TRUE, indicating whether the default stat should be "mean" or "median".
rmOutliers: logical, default FALSE, indicating whether to apply outlier detection and removal.
crossGroupMad: logical indicating whether to calculate row MAD values using the median across groups for each row. The median is calculated using non-NA and non-zero row group MAD values. When crossGroupMad=TRUE it also calculates the non-NA, non-zero median row MAD across all rows, which defines the minimum difference from median applied across all values to be considered an outlier.
madFactor: numeric value indicating the multiple of the MAD value to define outliers. For example madFactor=5

will take the MAD value for a group multiplied by 5, 5MAD, as a threshold for outliers. So any points more than 5MAD distance from the median per group are outliers.
returnType: character, default "output", the return data type:
- "output" returns one summary stat value per group, per row;
- "input" is useful when rmOutliers=TRUE in that it returns a matrix with the same dimensions as the input, except with outlier points replaced with NA.
rowStatsFunc: function, default NULL, which takes a numeric matrix as input, and returns a numeric vector equal to the number of rows of the input data matrix. When supplied, useMedian is ignored. Examples: base::rowMeans(), matrixStats::rowMedians(), matrixStats::rowMads.
groupOrder: character string indicating how character group labels are ordered in the final data matrix, when returnType="output". Note that when groups is a factor, the factor levels are kept in that order. Otherwise, "same" keeps groups in the same order they appear in the input matrix; "sort" applies jamba::mixedSort() to the labels.
keepNULLlevels: logical, default FALSE, whether to keep factor levels even when there are no corresponding columns in x. When TRUE and returnType="output" the output matrix will contain one colname for each factor level, with NA values used to fill empty factor levels. This mechanism can be helpful to ensure that output matrices have consistent colnames.
includeAttributes: logical, default FALSE, whether to include attributes with "n" number of replicates per group, and "nLabel"

with replicate label in n=# form.
verbose: logical indicating whether to print verbose output.
...: additional parameters are passed to rowStatsFunc, and if rmOutliers=TRUE to jamba::rowRmMadOutliers().

Returns

numeric matrix based upon returnType:

When returnType="output" the output is a numeric matrix with the same number of columns as the number of unique groups labels. When groups is a factor and keepNULLlevels=TRUE, the number of columns will be the number of factor levels, otherwise it will be the number of factor levels used in groups.
When returnType="input" the output is a numeric matrix with the same dimensions as the input data. This output is intended for use with rmOutliers=TRUE which will replace outlier points with NA values. Therefore, this matrix can be used to see the location of outliers.

The function also returns attributes when includeAttributes=TRUE, although the default is FALSE. The attributes describe the number of samples per group overall:

attr(out, "n"): The attribute "n" is used to describe the number of replicates per group.
attr(out, "nLabel"): The attribute "nLabel" is a simple text label in the form "n=3".

Note that when rmOutliers=TRUE the number of replicates per group will vary depending upon the outliers removed. In that case, remember that the reported "n"is always the total possible columns available prior to outlier removal.

Details

This function by default calculates group mean values per row in a numeric matrix. However, the stat function can be changed to calculate row medians, row MADs, etc.

An added purpose of this function is optional outlier filtering, via calculation of MAD values and applying a MAD threshold cutoff. The intention is to identify technical outliers that otherwise adversely affect the calculated group mean or median values. To inspect the data after outlier removal, use the parameter returnType="input"

which will return the input data matrix with NA

substituted for outlier points. Outlier detection and removal is performed by jamba::rowRmMadOutliers().

Examples


x <- matrix(ncol=9, stats::rnorm(90));
colnames(x) <- LETTERS[1:9];
use_groups <- rep(letters[1:3], each=3)
rowGroupMeans(x, groups=use_groups)

# rowGroupRmOutliers returns the input data after outlier removal
rowGroupRmOutliers(x, groups=use_groups, returnType="input")

# rowGroupMeans(..., returnType="input") also returns the input data
rowGroupMeans(x, groups=use_groups, rmOutliers=TRUE, returnType="input")

# rowGroupMeans with outlier removal
rowGroupMeans(x, groups=use_groups, rmOutliers=TRUE)

rowGroupMeans function

Calculate row group means, or other statistics

Arguments

Returns

Details

Examples

See Also