Data Splitting Algorithms for Model Developments
Check whether the sample set is full
Main function of data splitting algorithm
Initial sampling of DUPLEX
Repeat sampling of DUPLEX
'DSAM' - DUPLEX algorithm
Get the AUC value between two datasets
Get the maximum of the output column from the original data set
Get the mean and standard deviation of the output column from the orig...
Get the minimum of the output column from the original data set
Get sampling number of each SOM neuron
'DSAM' - MDUPLEX algorithm
Default parameter list
Get the remain unsampled data after SSsample
'DSAM' - SBSS.P algorithm
Select specific split data
Self-organized map clustering
'DSAM' - SOMPLEX algorithm
'DSAM' - SS algorithm
Core function of SS sampling
Standardized data
'DSAM' - Time-consecutive algorithm
Providing six different algorithms that can be used to split the available data into training, test and validation subsets with similar distribution for hydrological model developments. The dataSplit() function will help you divide the data according to specific requirements, and you can refer to the par.default() function to set the parameters for data splitting. The getAUC() function will help you measure the similarity of distribution features between the data subsets. For more information about the data splitting algorithms, please refer to: Chen et al. (2022) <doi:10.1016/j.jhydrol.2022.128340>, Zheng et al. (2022) <doi:10.1029/2021WR031818>.