Exploratory Data Analysis and Data Preparation Tool-Box
Reduce cardinality in categorical variable by automatic grouping
Profiling analysis of categorical vs. target variable
Compare two data frames by keys
Concatenate 'N' variables
Convert every column in a data frame to character
Coordinate plot
Get correlation against target variable
Cross-plotting input variable vs. target variable
Data integrity
Check data integrity model
Profiling categorical variable
Profiling categorical variable (rank)
Get a summary for the given data frame (o vector).
Discretize a data frame
Get the data frame thresholds for discretization
Variable discretization by gain ratio maximization
Computes the entropy between two variables
Equal frequency binning
Export plot to jpeg file
Fibonacci series
Frequency table for categorical variables
funModeling: Exploratory data analysis, data preparation and model per...
Generates lift and cumulative gain performance table and plot
Gain ratio
Sampling training and test data
Hampel Outlier Threshold
Computes several information theory metrics between two vectors
Information gain
Plotting numerical data
Correlation plots
Outliers Data Preparation
Profiling numerical data
Transform a variable into the [0-1] range
Get a summary for the given data frame (o vector).
Tukey Outlier Threshold
Compare two vectors
Importance variable ranking based on information theory
Around 10% of almost any predictive modeling project is spent in predictive modeling, 'funModeling' and the book Data Science Live Book (<https://livebook.datascienceheroes.com/>) are intended to cover remaining 90%: data preparation, profiling, selecting best variables 'dataViz', assessing model performance and other functions.