leakr0.1.0 package

Data Leakage Detection Tools for Machine Learning

clean_column_names

Enhanced column name cleaning with better robustness

compile_report

Enhanced report compilation with numeric severity scores

detect_and_convert_dates_enhanced

Enhanced date detection handling multiple formats and data types

detect_file_format

Detect file format from extension and content

detector_registry

Registry-based Detector System

determine_risk_level

Determine risk level and CSS class from severity counts.

dot-onLoad

Initialise built-in detectors

empty_snapshot_info

Helper function to return an empty snapshot info dataframe

export_data_internal

Export data with consistent messaging

format_detector_name

Format detector names for display.

generate_diagnostic_plots

Generate diagnostic plots for a leakr_report

generate_evidence_section

Generate evidence section with format-specific handling and DRY logic.

generate_executive_summary_text

Report generator

generate_issues_section

Generate detailed issues section with output formatting and truncation...

generate_recommendations_section

Format recommendations for output.

generate_recommendations

Generate actionable recommendations based on report findings.

get_detector_info

Get detector information

grapes-or-or-grapes

Null-coalescing operator for clean default value handling

import_csv

Import CSV files with robust parsing

import_excel

Import Excel files with enhanced sheet support

import_json

Import JSON files with better structure handling

import_parquet

Import Parquet files

import_rds

Import RDS files with validation

import_tsv

Import TSV files with robust parsing

leakr_audit

Audit dataset for data leakage

leakr_create_snapshot

Create data snapshots with improved metadata handling

leakr_export_data

Export data in various formats

leakr_from_caret

Convert caret training objects to standard format

leakr_from_mlr3

Convert mlr3 Task objects to standard format

leakr_from_tidymodels

Convert tidymodels workflow to standard format

leakr_import

Import data from various sources for leakage analysis

leakr_list_snapshots

List available snapshots with enhanced information

leakr_load_snapshot

Load data snapshot with enhanced validation

leakr_plot

Plot leakage detection results

leakr_quick_import

Fast import with default preprocessing

leakr_summarise

Enhanced summarise with better formatting

leakr

leakr: Data Leakage Detection for Machine Learning in R

list_registered_detectors

List Registered Detectors

new_temporal_detector

Create a new temporal detector

new_train_test_detector

Create a new train-test detector

plot.detector_result

Plot a detector_result object

plot.udld_report

Plot a udld_report object

prepare_audit_data

Enhanced data preparation with robust preprocessing

preprocess_imported_data

Enhanced preprocessing with better performance and robustness

print.leakr_report

Print method for leakr_report

register_detector

Register a new detector

run_detector

Run a detector on data

run_detectors

Run multiple detectors on audit data

stratified_sample

Stratified sampling helper

validate_and_preprocess_data

Robust data validation and preprocessing

validate_imported_data

Enhanced data validation with better error messages

Provides utilities to detect common data leakage patterns including train/test contamination, temporal leakage, and data duplication, enhancing model reliability and reproducibility in machine learning workflows. Generates diagnostic reports and visual summaries to support data validation. Methods based on best practices from Hastie, Tibshirani, and Friedman (2009, ISBN:978-0387848570).

  • Maintainer: Cheryl Isabella Lim
  • License: MIT + file LICENSE
  • Last published: 2025-10-26