Data Leakage Detection Tools for Machine Learning
Enhanced column name cleaning with better robustness
Enhanced report compilation with numeric severity scores
Enhanced date detection handling multiple formats and data types
Detect file format from extension and content
Registry-based Detector System
Determine risk level and CSS class from severity counts.
Initialise built-in detectors
Helper function to return an empty snapshot info dataframe
Export data with consistent messaging
Format detector names for display.
Generate diagnostic plots for a leakr_report
Generate evidence section with format-specific handling and DRY logic.
Report generator
Generate detailed issues section with output formatting and truncation...
Format recommendations for output.
Generate actionable recommendations based on report findings.
Get detector information
Null-coalescing operator for clean default value handling
Import CSV files with robust parsing
Import Excel files with enhanced sheet support
Import JSON files with better structure handling
Import Parquet files
Import RDS files with validation
Import TSV files with robust parsing
Audit dataset for data leakage
Create data snapshots with improved metadata handling
Export data in various formats
Convert caret training objects to standard format
Convert mlr3 Task objects to standard format
Convert tidymodels workflow to standard format
Import data from various sources for leakage analysis
List available snapshots with enhanced information
Load data snapshot with enhanced validation
Plot leakage detection results
Fast import with default preprocessing
Enhanced summarise with better formatting
leakr: Data Leakage Detection for Machine Learning in R
List Registered Detectors
Create a new temporal detector
Create a new train-test detector
Plot a detector_result object
Plot a udld_report object
Enhanced data preparation with robust preprocessing
Enhanced preprocessing with better performance and robustness
Print method for leakr_report
Register a new detector
Run a detector on data
Run multiple detectors on audit data
Stratified sampling helper
Robust data validation and preprocessing
Enhanced data validation with better error messages
Provides utilities to detect common data leakage patterns including train/test contamination, temporal leakage, and data duplication, enhancing model reliability and reproducibility in machine learning workflows. Generates diagnostic reports and visual summaries to support data validation. Methods based on best practices from Hastie, Tibshirani, and Friedman (2009, ISBN:978-0387848570).