euler function

Area-proportional Euler diagrams

Area-proportional Euler diagrams

Fit Euler diagrams (a generalization of Venn diagrams) using numerical optimization to find exact or approximate solutions to a specification of set relationships. The shape of the diagram may be a circle or an ellipse.

euler(combinations, ...) ## Default S3 method: euler( combinations, input = c("disjoint", "union"), shape = c("circle", "ellipse"), loss = c("square", "abs", "region"), loss_aggregator = c("sum", "max"), control = list(), ... ) ## S3 method for class 'data.frame' euler( combinations, weights = NULL, by = NULL, sep = "_", factor_names = TRUE, ... ) ## S3 method for class 'matrix' euler(combinations, ...) ## S3 method for class 'table' euler(combinations, ...) ## S3 method for class 'list' euler(combinations, ...)

Arguments

  • combinations: set relationships as a named numeric vector, matrix, or data.frame (see methods (by class) )

  • ...: arguments passed down to other methods

  • input: type of input: disjoint identities ('disjoint') or unions ('union').

  • shape: geometric shape used in the diagram

  • loss: type of loss to minimize over. If "square" is used together with the value "sum" for loss_aggregator, then the resulting loss function is the sum of squared errors, which is the default.

  • loss_aggregator: how the final loss is computed. "sum" indicates that the sum of the losses computed by loss are summed up. "max" indicates

  • control: a list of control parameters.

    • extraopt: should the more thorough optimizer (currently GenSA::GenSA()) kick in (provided extraopt_threshold is exceeded)? The default is TRUE for ellipses and three sets and FALSE otherwise.
    • extraopt_threshold: threshold, in terms of diagError, for when the extra optimizer kicks in. This will almost always slow down the process considerably. A value of 0 means that the extra optimizer will kick in if there is any error. A value of 1 means that it will never kick in. The default is 0.001.
    • extraopt_control: a list of control parameters to pass to the extra optimizer, such as max.call. See GenSA::GenSA().
  • weights: a numeric vector of weights of the same length as the number of rows in combinations.

  • by: a factor or character matrix to be used in base::by() to split the data.frame or matrix of set combinations

  • sep: a character to use to separate the dummy-coded factors if there are factor or character vectors in 'combinations'.

  • factor_names: whether to include factor names when constructing dummy codes

Returns

A list object of class 'euler' with the following parameters. - ellipses: a matrix of h and k (x and y-coordinates for the centers of the shapes), semiaxes a and b, and rotation angle phi

  • original.values: set relationships in the input

  • fitted.values: set relationships in the solution

  • residuals: residuals

  • regionError: the difference in percentage points between each disjoint subset in the input and the respective area in the output

  • diagError: the largest regionError

  • stress: normalized residual sums of squares

Details

If the input is a matrix or data frame and argument by is specified, the function returns a list of euler diagrams.

The function minimizes the residual sums of squares,

i=1n(Aiωi)2,(Aiωi)2, \sum_{i=1}^n (A_i - \omega_i)^2,\sum (A_i - \omega_i)^2,

by default, where ωi\omega_i the size of the ith disjoint subset, and AiA_i the corresponding area in the diagram, that is, the unique contribution to the total area from this overlap. The loss function can, however, be controlled via the loss argument.

euler() also returns stress (from venneuler), as well as diagError, and regionError from eulerAPE.

The stress statistic is computed as

i=1n(Aiβωi)2i=1nAi2,(Aiβωi)2/Ai2, \frac{\sum_{i=1}^n (A_i - \beta\omega_i)^2}{\sum_{i=1}^n A_i^2},\sum (A_i - \beta\omega_i)^2 / \sum A_i^2,

where

β=i=1nAiωi/i=1nωi2.β=Aiωi/ωi2. \beta = \sum_{i=1}^n A_i\omega_i / \sum_{i=1}^n \omega_i^2.\beta = \sum A_i\omega_i / \sum \omega_i^2.

regionError is computed as

Aii=1nAiωii=1nωi.maxAi/Aωi/ω. \left| \frac{A_i}{\sum_{i=1}^n A_i} - \frac{\omega_i}{\sum_{i=1}^n \omega_i}\right|.max|A_i / \sum A - \omega_i / \sum \omega|.

diagError is simply the maximum of regionError.

Methods (by class)

  • euler(default): a named numeric vector, with combinations separated by an ampersand, for instance A&B = 10. Missing combinations are treated as being 0.
  • euler(data.frame): a data.frame of logicals, binary integers, or factors.
  • euler(matrix): a matrix that can be converted to a data.frame of logicals (as in the description above) via base::as.data.frame.matrix().
  • euler(table): A table with max(dim(x)) \< 3.
  • euler(list): a list of vectors, each vector giving the contents of that set (with no duplicates). Vectors in the list must be named.

Examples

# Fit a diagram with circles combo <- c(A = 2, B = 2, C = 2, "A&B" = 1, "A&C" = 1, "B&C" = 1) fit1 <- euler(combo) # Investigate the fit fit1 # Refit using ellipses instead fit2 <- euler(combo, shape = "ellipse") # Investigate the fit again (which is now exact) fit2 # Plot it plot(fit2) # A set with no perfect solution euler(c( "a" = 3491, "b" = 3409, "c" = 3503, "a&b" = 120, "a&c" = 114, "b&c" = 132, "a&b&c" = 50 )) # Using grouping via the 'by' argument through the data.frame method euler(fruits, by = list(sex, age)) # Using the matrix method euler(organisms) # Using weights euler(organisms, weights = c(10, 20, 5, 4, 8, 9, 2)) # The table method euler(pain, factor_names = FALSE) # A euler diagram from a list of sample spaces (the list method) euler(plants[c("erigenia", "solanum", "cynodon")])

References

Wilkinson L. Exact and Approximate Area-Proportional Circular Venn and Euler Diagrams. IEEE Transactions on Visualization and Computer Graphics (Internet). 2012 Feb (cited 2016 Apr 9);18(2):321-31. Available from: tools:::Rd_expr_doi("10.1109/TVCG.2011.56")

Micallef L, Rodgers P. eulerAPE: Drawing Area-Proportional 3-Venn Diagrams Using Ellipses. PLOS ONE (Internet). 2014 Jul (cited 2016 Dec 10);9(7):e101717. Available from: tools:::Rd_expr_doi("10.1371/journal.pone.0101717")

See Also

plot.euler(), print.euler(), eulerr_options(), venn()