ggfreqScatter function

Frequency Scatterplot

Frequency Scatterplot

Uses ggplot2 to plot a scatterplot or dot-like chart for the case where there is a very large number of overlapping values. This works for continuous and categorical x and y. For continuous variables it serves the same purpose as hexagonal binning. Counts for overlapping points are grouped into quantile groups and level of transparency and rainbow colors are used to provide count information.

Instead, you can specify stick=TRUE not use color but to encode cell frequencies with the height of a black line y-centered at the middle of the bins. Relative frequencies are not transformed, and the maximum cell frequency is shown in a caption. Every point with at least a frequency of one is depicted with a full-height light gray vertical line, scaled to the above overall maximum frequency. In this way to relative frequency is to proportion of these light gray lines that are black, and one can see points whose frequencies are too low to see the black lines.

The result can also be passed to ggplotly. Actual cell frequencies are added to the hover text in that case using the label ggplot2 aesthetic.

ggfreqScatter(x, y, by=NULL, bins=50, g=10, cuts=NULL, xtrans = function(x) x, ytrans = function(y) y, xbreaks = pretty(x, 10), ybreaks = pretty(y, 10), xminor = NULL, yminor = NULL, xlab = as.character(substitute(x)), ylab = as.character(substitute(y)), fcolors = viridis::viridis(10), nsize=FALSE, stick=FALSE, html=FALSE, prfreq=FALSE, ...)

Arguments

  • x: x-variable

  • y: y-variable

  • by: an optional vector used to make separate plots for each distinct value using facet_wrap()

  • bins: for continuous x or y is the number of bins to create by rounding. Ignored for categorical variables. If a 2-vector, the first element corresponds to x and the second to y.

  • g: number of quantile groups to make for frequency counts. Use g=0 to use frequencies continuously for color coding. This is recommended only when using plotly.

  • cuts: instead of using g, specify cuts to provide the vector of cuts for categorizing frequencies for assignment to colors

  • xtrans,ytrans: functions specifying transformations to be made before binning and plotting

  • xbreaks,ybreaks: vectors of values to label on axis, on original scale

  • xminor,yminor: values at which to put minor tick marks, on original scale

  • xlab,ylab: axis labels. If not specified and variable has a label, thatu label will be used.

  • fcolors: colors argument to pass to scale_color_gradientn to color code frequencies. Use fcolors=gray.colors(10, 0.75, 0) to show gray scale, for example. Another good choice is fcolors=hcl.colors(10, 'Blue-Red').

  • nsize: set to TRUE to not vary color or transparency but instead to size the symbols in relation to the number of points. Best with both x and y are discrete. ggplot2

    size is taken as the fourth root of the frequency. If there are 15 or unique frequencies all the unique frequencies are used, otherwise g quantile groups of frequencies are used.

  • stick: set to TRUE to not use colors but instead use varying-height black vertical lines to depict cell frequencies.

  • html: set to TRUE to use html in axis labels instead of plotmath

  • prfreq: set to TRUE to print the frequency distributions of the binned coordinate frequencies

  • ...: arguments to pass to geom_point such as shape

    and size

Returns

a ggplot object

Author(s)

Frank Harrell

See Also

cut2

Examples

require(ggplot2) set.seed(1) x <- rnorm(1000) y <- rnorm(1000) count <- sample(1:100, 1000, TRUE) x <- rep(x, count) y <- rep(y, count) # color=alpha=NULL below makes loess smooth over all points g <- ggfreqScatter(x, y) + # might add g=0 if using plotly geom_smooth(aes(color=NULL, alpha=NULL), se=FALSE) + ggtitle("Using Deciles of Frequency Counts, 2500 Bins") g # plotly::ggplotly(g, tooltip='label') # use plotly, hover text = freq. only # Plotly makes it somewhat interactive, with hover text tooltips # Instead use varying-height sticks to depict frequencies ggfreqScatter(x, y, stick=TRUE) + labs(subtitle='Relative height of black lines to gray lines is proportional to cell frequency. Note that points with even tiny frequency are visable (gray line with no visible black line).') # Try with x categorical x1 <- sample(c('cat', 'dog', 'giraffe'), length(x), TRUE) ggfreqScatter(x1, y) # Try with y categorical y1 <- sample(LETTERS[1:10], length(x), TRUE) ggfreqScatter(x, y1) # Both categorical, larger point symbols, box instead of circle ggfreqScatter(x1, y1, shape=15, size=7) # Vary box size instead ggfreqScatter(x1, y1, nsize=TRUE, shape=15)