dataPreparation1.1.2 package

Automated Data Preparation

aggregate_by_key

Automatic data_set aggregation by key

as.POSIXct_fast

Faster date transformation

build_bins

Compute bins

build_date_factor

Date Factor

build_encoding

Compute encoding

build_scales

Compute scales

build_target_encoding

Build target encoding

compute_probability_ratio

Compute probability ratio

compute_weight_of_evidence

Compute weight of evidence

data_preparation_news

Show the NEWS file

date_format_unifier

Unify dates format

description

Describe data set

fast_discretization

Discretization

fast_filter_variables

Filtering useless variables

fast_handle_na

Handle NA values

fast_is_equal

Fast checks of equality

fast_round

Fast round

fast_scale

scale

find_and_transform_dates

Identify date columns

find_and_transform_numerics

Identify numeric columns in a data_set set

generate_date_diffs

Date difference

generate_factor_from_date

Generate factor from dates

generate_from_character

Recode character

generate_from_factor

Recode factor

get_most_frequent_element

Get most frequent element

identify_dates

Identify date columns

one_hot_encoder

One hot encoder

prepare_set

Preparation pipeline

remove_percentile_outlier

Percentile outlier filtering

remove_rare_categorical

Filter rare categories

remove_sd_outlier

Standard deviation outlier filtering

same_shape

Give same shape

set_as_numeric_matrix

Numeric matrix preparation for Machine Learning.

set_col_as_character

Set columns as character

set_col_as_date

Set columns as POSIXct

set_col_as_factor

Set columns as factor

set_col_as_numeric

Set columns as numeric

shape_set

Final preparation before ML algorithm

target_encode

Target encode

un_factor

Unfactor factor with too many values

which_are_bijection

Identify bijections

which_are_constant

Identify constant columns

which_are_in_double

Identify double columns

which_are_included

Identify columns that are included in others

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

  • Maintainer: Emmanuel-Lin Toulemonde
  • License: GPL-3 | file LICENSE
  • Last published: 2025-09-02