Leveraging Experiment Lines to Data Analytics
Action implementation for transform
Action
Adjust categorical mapping
Adjust to data frame
Adjust factors
Adjust to matrix
Aggregation by groups
Autoencoder base (encoder)
Autoencoder base (encoder + decoder)
Categorical mapping (oneāhot encoding)
Bagging (ipred)
Boosting (adabag)
Decision Tree for classification
Logistic regression (GLM)
LASSO logistic regression (glmnet)
K-Nearest Neighbors (KNN) Classification
Majority baseline classifier
MLP for classification
Multinomial logistic regression
Naive Bayes Classifier
Random Forest for classification
CART (rpart)
SVM for classification
Classification tuning (k-fold CV)
XGBoost
Classification base class
Clustering tuning (intrinsic metric)
Fuzzy c-means
DBSCAN
Gaussian mixture model clustering (GMM)
Hierarchical clustering
k-means
Louvain community detection
PAM (Partitioning Around Medoids)
Cluster
Clusterer
Class dal_base
Graphics utilities
DAL Learner (base class)
DAL Transform
DAL Tune (base for hyperparameter search)
Data sampling abstractions
Discover
PCA
Evaluate
Feature generation
Feature selection by correlation
Maximum curvature analysis (elbow detection)
Minimum curvature analysis (elbow detection)
tune hyperparameters of ml model
fit dbscan model
Fit
Hierarchy mapping by cut
Simple imputation
Inverse Transform
K-fold sampling
Min-max normalization
Missing value removal
Outlier removal by boxplot (IQR rule)
Outlier removal by Gaussian 3-sigma rule
Apriori rules
cSPADE sequences
ECLAT itemsets
Pattern miner
Plot bar graph
Boxplot per class
Plot boxplot
Plot correlation
Plot dendrogram
Plot density per class
Plot density
Plot grouped bar
Plot histogram
Plot lollipop
Plot advanced scatter matrix
Plot scatter matrix
Plot parallel coordinates
Plot pie
Plot pixel visualization
Plot points
Plot radar
Scatter graph
Plot series
Plot stacked bar
Plot time series with predictions
Plot time series chart
Predictor (base for classification/regression)
Decision Tree for regression
K-Nearest Neighbors (KNN) Regression
Linear regression (lm)
MLP for regression
Random Forest for regression
SVM for regression
Regression tuning (k-fold CV)
Regression base class
Class balancing (up/down sampling)
Cluster sampling
Random sampling
Simple sampling
Stratified sampling
selection of hyperparameters
Selection of hyperparameters
Default Assign parameters
Assign parameters
Smoothing by clustering (k-means)
Smoothing by equal frequency
Smoothing by equal interval
Smoothing (binning/quantization)
k-fold training and test partition object
Train-Test Partition
Transform
Z-score normalization
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Useful links