Fit linear regressions by group, and get different output options.
Fit linear regressions by group, and get different output options.
With this function it's possible to fit linear regressions by a grouping variable, and get a data frame with each column as a coefficient and quality of fit variables, and other output options. Works with dplyr grouping functions.
model: A linear regression model, with or without quotes. The variables mentioned in the model must exist in the provided data frame. X and Y sides of the model must be separated by "~".
.groups: Optional argument. Quoted name(s) of grouping variables used to fit multiple regressions, one for each level of the provided variable(s). Default: NA.
output: Selects different output options. Can be either "table", "merge", "merge_est" and "nest". See details for explanations for each option. Default: "table".
est.name: Name of the estimated y value. Used only if est.name = TRUE. Default: "est".
keep_model: If TRUE, a column containing lm object(s) is kept in the output. Useful if the user desires to get more information on the regression. Default: FALSE.
rmoutliers: If TRUE, outliers are filtered out using the IQR method. Default: FALSE.
fct_to_filter: Name of a factor or character column to be used as a filter to remove levels. Default: NA.
rmlevels: Levels of the fct_to_filter variable to be removed from the fit Default: NA.
boolean_filter: Name of a Boolean column to be used as a filter to remove data. Default: NA.
onlyfiteddata: If TRUE, the output data will be the same as the fitted (and possibly filtered) data. Default: FALSE.
del_boolean: If TRUE, the Boolean column supplied will be deleted after use. Default: FALSE.
Returns
A data frame. Different data frame options are available using the output argument.
Details
With this function there's no more need to use the do function when fitting a linear regression in a pipe line. It's also possible to easily make fit multiple regressions, specifying a grouping variable. In addition to that, the default output sets each coefficient as a column, making it easy to call coefficients by name or position when estimating values.
It's possible to use the output argument to get a merged table if output="merge", that binds the original data frame and the fitted coefficients. If output="merge_est" we get a merged table as well, but with y estimated using the coefficients. If the fit is made using groups, this is taken into account, i.e. the estimation is made by group.
If output="nest", a data frame with nested columns is provided. This can be used if the user desires to get a customized output.
Examples
library(forestmangr)library(dplyr)data("exfm19")head(exfm19)# Fit Schumacher and Hall model for volume estimation, and get# coefficient, R2 and error values:lm_table(exfm19, log(VWB)~ log(DBH)+ log(TH))# Fit SH model by group:lm_table(exfm19, log(VWB)~ log(DBH)+ log(TH),"STRATA")# This can also be done using dplyr::group_by:exfm19 %>% group_by(STRATA)%>% lm_table(log(VWB)~ log(DBH)+ log(TH))# It's possible to merge the original data with the table containg the coefficients# using the output parameter:fit <- lm_table(exfm19, log(VWB)~ log(DBH)+ log(TH),"STRATA", output ="merge")head(fit)# It's possible to merge the original data with the table,# and get the estimated values for this model:fit <- lm_table(exfm19, log(VWB)~ log(DBH)+ log(TH),"STRATA", output ="merge_est", est.name ="VWB_EST")head(fit)# It's possible to further customize the output,# unnesting the nested variables provided when output is defined as "nest":lm_table(exfm19, log(VWB)~ log(DBH)+ log(TH),"STRATA", output ="nest")