Detecting Environmental Outliers in Data Analysis Pipelines
Adjust the boxplots bounding fences using medcouple to flag suspicious...
Identifies the best method for outlier detection for a single species.
To implement bootstrapping procedures. Sampling with replacement.
Outlier detection method broad classification.
Check species names for inconsistencies
Check for packages to install and respond to use
indicate excluded columns.
Post checks for PCA and bootstrapping
Extract final clean data using either absolute or best method generate...
Cosine similarity index based on (Gautam & Kulkarni 2014; Joy & Renumo...
Outlier detection class for multiple methods
Distribution boxplot
Check for environmental outliers using species optimal ranges.
Computes the empirical influence function for each values in the datas...
To check for a bounding box
Extract final clean data using either absolute or best method generate...
List of outlier detection methods implemented in this package.
Extract outliers for a one species
Checks for geographic ranges from FishBase
Download species records from online database.
get dataframe from the large dataframe.
Title Plotting to show the quality controlled data in environmental sp...
Identify if enough methods are selected for the outlier detection.
Visualize the outliers identified by each method
Identify best outlier detection method using Hamming distance.
Flag suspicious outliers based on the Hampel filter method..
Catch errors during methods implementation.
Computes interquartile range to flag environmental outliers
Identify outliers using isolation forest model.
Identifies the best outlier detection method using Jaccard coefficient...
Identifies outliers using Reverse Jackknifing method based on Chapman ...
Log boxplot based for outlier detection.
Flags outliers based on Mahalanobis distance matrix for all records.
Data harmonizing for offline data based on Darwin Core terms .
Customized match function
Median rule method
Mixed Interquartile range and semiInterquartile range `Walker et al., ...
Identifies absolute outliers for multiple species.
Identify best method for outlier removal for multiple species using ma...
Ensemble multiple outlier detection methods.
Identifies absolute outliers and their proportions for a single specie...
Identify outliers using One Class Support Vector Machines
Optimize threshold for clean data extraction.
Identifies best outlier detection method using Overlap coefficient.
Implement principal component analysis for dimension reduction
To package both principal component analysis and bootstrapping.
Preliminary data cleaning including removing duplicates, records outsi...
Determine the threshold using Locally estimated or weighted Scatterplo...
Computes semi-interquantile range to flag suspicious outliers
Sequential fences method
set method for displaying output details after outlier detection.
Identify best outlier detection method using simple matching coefficie...
Identifies best outlier detection method suing Sorensen Similarity Ind...
Collates minimum, maximum, and preferable temperatures from FishBase.
Global-Local Outlier Score from Hierarchies
Flags outliers using kmeans clustering method
k-nearest neighbors for outlier detection
Flags suspicious using the local outlier factor or Density-Based Spati...
Computes z-scores to flag environmental outliers.
A framework used to detect and handle outliers during data analysis workflows. Outlier detection is a statistical concept with applications in data analysis workflows, highlighting records that are suspiciously high or low. Outlier detection in distribution models was initiated by Chapman (1991) (available at <https://www.researchgate.net/publication/332537800_Quality_control_and_validation_of_point-sourced_environmental_resource_data>), who developed the reverse jackknifing method. The concept was further developed and incorporated into different R packages, including 'flexsdm' (Velazco et al., 2022, <doi:10.1111/2041-210X.13874>) and 'biogeo' (Robertson et al., 2016 <doi:10.1111/ecog.02118>). We compiled various outlier detection methods obtained from the literature, including those elaborated in Dastjerdy et al. (2023) <doi:10.3390/geotechnics3020022> and Liu et al. (2008) <doi:10.1109/ICDM.2008.17>. In this package, we introduced the ensembling aspect, where multiple outlier detection methods are used to flag the record as either an absolute outlier. The concept can also be applied in general data analysis, as well as during the development of species distribution models.
Useful links