find_transformations function

Transformations for simple linear regression

Transformations for simple linear regression

This function takes a simple linear regression model and finds the transformation of x and y that results in the highest R2

find_transformations(M,powers=seq(from=-3,to=3,by=.25),threshold=0.02,...)

Arguments

  • M: A simple linear regression model fitted with lm
  • powers: A sequence of powers to try for x and y. By default this ranges from -3 to 3 in steps of 0.25. If 0 is a valid power, then the logarithm is used instead.
  • threshold: Report all models that have an R2 that is within threshold of the model with the highest R2
  • ...: Additional arguments to plot such as pch and cex.

Details

The relationship between y and x may not be linear. However, some transformation of y may have a linear relationship with some transformation of x. This function considers simple linear regression with x and y raised to powers between -3 and 3 (in 0.25 increments) by default. The function outputs a list of the top models as gauged by R^2 (all models within 0.02 of the highest R^2). Note: there is no guarantee that these "best" transformations are actually good, since a large R^2 can be produced by outliers created during transformations. A plot of the transformation is also provided.

It is exceedingly rare that the "best" transformation is raising x and y to the 1 power (i.e., the original variables). Transformations are typically used only when there are issues in the residuals plots, highly skewed variables, or physical/logical justifications.

Note: if a variable has 0s or negative numbers, only integer transformations are considered.

References

Introduction to Regression and Modeling

Author(s)

Adam Petrie

Examples

#Straightforward example data(BULLDOZER) M <- lm(SalePrice~YearMade,data=BULLDOZER) find_transformations(M,pch=20,cex=0.3) #Results are very misleading since selected models have high R2 due to outliers data(MOVIE) M <- lm(Total~Weekend,data=MOVIE) find_transformations(M,powers=seq(-2,2,by=0.5),threshold=0.05)
  • Maintainer: Adam Petrie
  • License: GPL (>= 2)
  • Last published: 2020-02-21

Useful links