DR() R function from [funtimes]

Downhill Riding (DR) Procedure

Downhill riding procedure for selecting optimal tuning parameters in clustering algorithms, using an (in)stability probe.


DR(X, method, minPts = 3, theta = 0.9, B = 500, lb = -30, ub = 10)

Arguments

X: an $n\times k$ matrix where columns are $k$ objects to be clustered, and each object contains n observations (objects could be a set of time series).
method: the clustering method to be used -- currently either TRUST if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_citeOnly(keys="Ciampi_etal_2010",package="funtimes",cached_env=.Rdpack.currefs)

or DBSCAN if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_citeOnly(keys="Ester_etal_1996",package="funtimes",cached_env=.Rdpack.currefs) . If the method is DBSCAN, then set MinPts and optimal $\epsilon$ is selected using DR. If the method is TRUST, then set theta, and optimal $\delta$

is selected using DR.
minPts: the minimum number of samples in an $\epsilon$ -neighborhood of a point to be considered as a core point. The minPts is to be used only with the DBSCAN method. The default value is 3.
theta: connectivity parameter $\theta \in (0,1)$ , which is to be used only with the TRUST method. The default value is 0.9.
B: number of random splits in calculating the Average Cluster Deviation (ACD). The default value is 500.
lb, ub: endpoints for a range of search for the optimal parameter.

Returns

A list containing the following components: - P_opt: the value of the optimal parameter. If the method is DBSCAN, then P_opt is optimal $\epsilon$ . If the method is TRUST, then P_opt is optimal $\delta$ .

ACD_matrix: a matrix that returns ACD for different values of a tuning parameter. If the method is DBSCAN, then the tuning parameter is $\epsilon$ . If the method is TRUST, then the tuning parameter is $\delta$ .

Details

Parameters lb,ub are endpoints for the search for the optimal parameter. The parameter candidates are calculated in a way such that $P:= 1.1^x , x \in {lb,lb+0.5,lb+1.0,...,ub}$ . Although the default range of search is sufficiently wide, in some cases lb,ub can be further extended if a warning message is given.

For more discussion on properties of the considered clustering algorithms and the DR procedure see if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_citeOnly(keys="Huang_etal_2016;textual",package="funtimes",cached_env=.Rdpack.currefs)

and if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_citeOnly(keys="Huang_etal_2018_riding;textual",package="funtimes",cached_env=.Rdpack.currefs) .

Examples


## Not run:

## example 1
## use iris data to test DR procedure

data(iris)  
require(clue)  # calculate NMI to compare the clustering result with the ground truth
require(scatterplot3d)

Data <- scale(iris[,-5])
ground_truth_label <- iris[,5]

# perform DR procedure to select optimal eps for DBSCAN 
# and save it in variable eps_opt
eps_opt <- DR(t(Data), method="DBSCAN", minPts = 5)$P_opt   

# apply DBSCAN with the optimal eps on iris data 
# and save the clustering result in variable res
res <- dbscan(Data, eps = eps_opt, minPts =5)$cluster  

# calculate NMI to compare the clustering result with the ground truth label
clue::cl_agreement(as.cl_partition(ground_truth_label),
                   as.cl_partition(as.numeric(res)), method = "NMI") 
# visualize the clustering result and compare it with the ground truth result
# 3D visualization of clustering result using variables Sepal.Width, Sepal.Length, 
# and Petal.Length
scatterplot3d(Data[,-4],color = res)
# 3D visualization of ground truth result using variables Sepal.Width, Sepal.Length,
# and Petal.Length
scatterplot3d(Data[,-4],color = as.numeric(ground_truth_label))

## example 2
## use synthetic time series data to test DR procedure

require(funtimes)
require(clue) 
require(zoo)

# simulate 16 time series for 4 clusters, each cluster contains 4 time series
set.seed(114) 
samp_Ind <- sample(12,replace=F)
time_points <- 30
X <- matrix(0,nrow=time_points,ncol = 12)
cluster1 <- sapply(1:4,function(x) arima.sim(list(order = c(1, 0, 0), ar = c(0.2)),
                                             n = time_points, mean = 0, sd = 1))
cluster2 <- sapply(1:4,function(x) arima.sim(list(order = c(2 ,0, 0), ar = c(0.1, -0.2)),
                                             n = time_points, mean = 2, sd = 1))
cluster3 <- sapply(1:4,function(x) arima.sim(list(order = c(1, 0, 1), ar = c(0.3), ma = c(0.1)),
                                             n = time_points, mean = 6, sd = 1))

X[,samp_Ind[1:4]] <- t(round(cluster1, 4))
X[,samp_Ind[5:8]] <- t(round(cluster2, 4))
X[,samp_Ind[9:12]] <- t(round(cluster3, 4))

# create ground truth label of the synthetic data
ground_truth_label = matrix(1, nrow = 12, ncol = 1) 
for(k in 1:3){
    ground_truth_label[samp_Ind[(4*k - 4 + 1):(4*k)]] = k
}

# perform DR procedure to select optimal delta for TRUST
# and save it in variable delta_opt
delta_opt <- DR(X, method = "TRUST")$P_opt 

# apply TRUST with the optimal delta on the synthetic data 
# and save the clustering result in variable res
res <- CSlideCluster(X, Delta = delta_opt, Theta = 0.9)  

# calculate NMI to compare the clustering result with the ground truth label
clue::cl_agreement(as.cl_partition(as.numeric(ground_truth_label)),
                   as.cl_partition(as.numeric(res)), method = "NMI")

# visualize the clustering result and compare it with the ground truth result
# visualization of the clustering result obtained by TRUST
plot.zoo(X, type = "l", plot.type = "single", col = res, xlab = "Time index", ylab = "")
# visualization of the ground truth result 
plot.zoo(X, type = "l", plot.type = "single", col = ground_truth_label,
         xlab = "Time index", ylab = "")
## End(Not run)

References

if(!exists(".Rdpack.currefs")) .Rdpack.currefs <-new.env();Rdpack::insert_all_ref(.Rdpack.currefs)

Author(s)

Xin Huang, Yulia R. Gel

funtimes package Read PDF manual

Maintainer: Vyacheslav Lyubchich
License: GPL (>= 2)
Last published: 2023-03-21

Useful links

DR function