Draw a scatter plot with associated X and Y histograms, densities and correlation
Draw a scatter plot with associated X and Y histograms, densities and correlation
Draw a X Y scatter plot with associated X and Y histograms with estimated densities. Will also draw density plots by groups, as well as distribution ellipses by group. Partly a demonstration of the use of layout. Also includes lowess smooth or linear model slope, as well as correlation and Mahalanobis distances
x: The X vector, or the first column of a data.frame or matrix. Can be specified using formula input.
y: The Y vector, of if X is a data.frame or matrix, the second column of X
smooth: if TRUE, then add a loess smooth to the plot
ab: if TRUE, then show the best fitting linear fit
correl: TRUE: Show the correlation
data: if using formula input, the data must be specified
density: TRUE: Show the estimated densities
means: TRUE: If TRUE, show the means for the distributions.
ellipse: TRUE: draw 1 and 2 sigma ellipses and smooth
digits: How many digits to use if showing the correlation
method: Which method to use for correlation ("pearson","spearman","kendall") defaults to "pearson"
smoother: if TRUE, use smoothScatter instead of plot. Nice for large N.
nrpoints: If using smoothScatter, show nrpoints as dots. Defaults to 0
grid: If TRUE, show a grid for the scatter plot.
cex.cor: Adjustment for the size of the correlation
cex.point: Adjustment for the size of the data points
xlab: Label for the x axis
ylab: Label for the y axis
xlim: Allow specification for limits of x axis, although this seems to just work for the scatter plots.
ylim: Allow specification for limits of y axis
x.breaks: Number of breaks to suggest to the x axis histogram.
y.breaks: Number of breaks to suggest to the y axis histogram.
x.space: space between bars
y.space: Space between y bars
freq: Show frequency counts, otherwise show density counts
x.axes: Show the x axis for the x histogram
y.axes: Show the y axis for the y histogram
size: The sizes of the ellipses (in sd units). Defaults to 1,2
col: Colors to use when showing groups
transparency: Amount of transparency in the density plots
legend: Where to put a legend c("topleft","topright","top","left","right")
pch: Base plot character (each group is one more)
xlab.hist: Not currently available
ylab.hist: Label for y axis histogram. Not currently available.
title: An optional title
show.d: If TRUE, show the distances between the groups
d.arrow: If TRUE, draw an arrow between the two centroids
x.arrow: optional lable for the arrow connecting the two groups for the x axis
y.arrow: optional lable for the arrow connecting the two groups for the y axis
cex.arrow: cex control for the label size of the arrows.
line.col: color of the lowess or lm fit line
alpha: When drawing confidence intervals, what is the margin of error
ci: Draw confidence intervals
ci.col: color of confidence intervals
...: Other parameters for graphics
Details
Just a straightforward application of layout and barplot, with some tricks taken from pairs.panels. The various options allow for correlation ellipses (1 and 2 sigma from the mean), lowess smooths, linear fits, density curves on the histograms, and the value of the correlation. ellipse = TRUE implies smooth = TRUE. The grid option provides a background grid to the scatterplot.
If using grouping variables, will draw ellipses (defaults to 1 sd) around each centroid. This is useful when demonstrating Mahalanobis distances.
Formula input allows specification of grouping variables as well. )
For plotting data for two groups, Mahalobnis differences between the groups may be shown by drawing an arrow between the two centroids. This is a bit messy and it is useful to use pch="." in this case.
Author(s)
William Revelle
Note
Originally adapted from Addicted to R example 78. Modified following some nice suggestions from Jared Smith. Substantial revisions in 2021 to allow for a clearer demonstration of group differences.
See Also
pairs.panels for multiple plots, multi.hist for multiple histograms and histBy for single variables with multiple groups. Perhaps the best example is found in the psychTools::GERAS data set.
Examples
data(sat.act)with(sat.act,scatterHist(SATV,SATQ))scatterHist(SATV ~ SATQ,data=sat.act)#formula input#or for something a bit more splashyscatter.hist(sat.act[5:6],pch=(19+sat.act$gender),col=c("blue","red")[sat.act$gender],grid=TRUE)#better yetscatterHist(SATV ~ SATQ + gender,data=sat.act)#formula input with a grouping variable#If using a factor for grouping, we must first convert it to numericiris.1<- char2numeric(iris,flag=FALSE)scatterHist(Sepal.Width ~ Petal.Length + Species,data=iris.1, show.d=FALSE, main="Fisher's Iris example")#see pairs.panels