Do Residual Analysis and plot the fitted values vs residuals on a test dataset. Ideally, residuals should be randomly distributed. Patterns in this plot can indicate potential problems with the model selection, e.g., using simpler model than necessary, not accounting for heteroscedasticity, autocorrelation, etc. If you notice "striped" lines of residuals, that is just an indication that your response variable was integer valued instead of real valued.
h2o.residual_analysis_plot(model, newdata)
Arguments
model: An H2OModel.
newdata: An H2OFrame. Used to calculate residuals.
Returns
A ggplot2 object
Examples
## Not run:library(h2o)h2o.init()# Import the wine dataset into H2O:f <-"https://h2o-public-test-data.s3.amazonaws.com/smalldata/wine/winequality-redwhite-no-BOM.csv"df <- h2o.importFile(f)# Set the responseresponse <-"quality"# Split the dataset into a train and test set:splits <- h2o.splitFrame(df, ratios =0.8, seed =1)train <- splits[[1]]test <- splits[[2]]# Build and train the model:gbm <- h2o.gbm(y = response, training_frame = train)# Create the residual analysis plotresidual_analysis_plot <- h2o.residual_analysis_plot(gbm, test)print(residual_analysis_plot)## End(Not run)