getSepProj function

OPTIMAL PROJECTION DIRECTION AND CORRESPONDING SEPARATION INDEX FOR PAIRS OF CLUSTERS

OPTIMAL PROJECTION DIRECTION AND CORRESPONDING SEPARATION INDEX FOR PAIRS OF CLUSTERS

Optimal projection direction and corresponding separation index for pairs of clusters.

getSepProjTheory( muMat, SigmaArray, iniProjDirMethod = c("SL", "naive"), projDirMethod = c("newton", "fixedpoint"), alpha = 0.05, ITMAX = 20, eps = 1.0e-10, quiet = TRUE) getSepProjData( y, cl, iniProjDirMethod = c("SL", "naive"), projDirMethod = c("newton", "fixedpoint"), alpha = 0.05, ITMAX = 20, eps = 1.0e-10, quiet = TRUE)

Arguments

  • muMat: Matrix of mean vectors. Rows correspond to mean vectors for clusters.

  • SigmaArray: Array of covariance matrices. SigmaArray[,,i] record the covariance matrix of the i-th cluster.

  • y: Data matrix. Rows correspond to observations. Columns correspond to variables.

  • cl: Cluster membership vector.

  • iniProjDirMethod: Indicating the method to get initial projection direction when calculating the separation index between a pair of clusters (c.f. Qiu and Joe, 2006a, 2006b).

    iniProjDirMethod=SL indicates the initial projection direction is the sample version of the SL's projection direction (Su and Liu, 1993) (Σ1+Σ2)1(μ2μ1)\left(\boldsymbol{\Sigma}_1+\boldsymbol{\Sigma}_2\right)^{-1}\left(\boldsymbol{\mu}_2-\boldsymbol{\mu}_1\right)

    iniProjDirMethod=naive indicates the initial projection direction is μ2μ1\boldsymbol{\mu}_2-\boldsymbol{\mu}_1

  • projDirMethod: Indicating the method to get the optimal projection direction when calculating the separation index between a pair of clusters (c.f. Qiu and Joe, 2006a, 2006b).

    projDirMethod=newton indicates we use the Newton-Raphson method to search the optimal projection direction (c.f. Qiu and Joe, 2006a). This requires the assumptions that both covariance matrices of the pair of clusters are positive-definite. If this assumption is violated, the fixedpoint method could be used. The fixedpoint method iteratively searches the optimal projection direction based on the first derivative of the separation index to the project direction (c.f. Qiu and Joe, 2006b).

  • alpha: Tuning parameter reflecting the percentage in the two tails of a projected cluster that might be outlying. We set alpha=0.05=0.05 like we set the significance level in hypothesis testing as 0.050.05.

  • ITMAX: Maximum iteration allowed when to iteratively calculate the optimal projection direction. The actual number of iterations is usually much less than the default value 20.

  • eps: Convergence threshold. A small positive number to check if a quantitiy qq is equal to zero. If q<|q|<eps, then we regard qq

    as equal to zero. eps is used to check if an algorithm converges. The default value is 1.0e101.0e-10.

  • quiet: A flag to switch on/off the outputs of intermediate results and/or possible warning messages. The default value is TRUE.

Details

When calculating the optimal projection direction and corresponding optimal separation index for a pair of cluster, if one or both cluster covariance matrices is/are singular, the newton method can not be used. In this case, the functions getSepProjTheory and getSepProjData

will automatically use the fixedpoint method to search the optimal projection direction, even if the user specifies the value of the argument projDirMethod as newton . Also, multiple initial projection directions will be evaluated.

Specifically, 2+2p2+2p projection directions will be evaluated. The first projection direction is the naive direction μ2μ1\boldsymbol{\mu}_2-\boldsymbol{\mu}_1. The second projection direction is the SL projection direction c("left(boldsymbolSigma1+boldsymbolSigma2right)1\n\\left(\\boldsymbol{\\Sigma}_1+\\boldsymbol{\\Sigma}_2\\right)^{-1}\n", "left(boldsymbolmu2boldsymbolmu1right)\\left(\\boldsymbol{\\mu}_2-\\boldsymbol{\\mu}_1\\right)"). The next pp projection directions are the pp eigenvectors of the covariance matrix of the first cluster. The remaining pp projection directions are the pp eigenvectors of the covariance matrix of the second cluster.

Each of these 2+2p2+2*p projection directions are in turn used as the initial projection direction for the fixedpoint algorithm to obtain the optimal projection direction and the corresponding optimal separation index. We also obtain 2+2p2+2*p separation indices by projecting two clusters along each of these 2+2p2+2*p projection directions.

Finally, the projection direction with the largest separation index among the 2(2+2p)2*(2+2*p) optimal separation indices is chosen as the optimal projection direction. The corresponding separation index is chosen as the optimal separation index.

Returns

  • sepValMat: Separation index matrix

  • projDirArray: Array of projection directions for each pair of clusters

References

Qiu, W.-L. and Joe, H. (2006a) Generation of Random Clusters with Specified Degree of Separaion. Journal of Classification, 23 (2), 315-334.

Qiu, W.-L. and Joe, H. (2006b) Separation Index and Partial Membership for Clustering. Computational Statistics and Data Analysis, 50 , 585--603.

Su, J. Q. and Liu, J. S. (1993) Linear Combinations of Multiple Diagnostic Markers. Journal of the American Statistical Association, 88 , 1350--1355.

Author(s)

Weiliang Qiu weiliang.qiu@gmail.com

Harry Joe harry@stat.ubc.ca

Examples

n1 <- 50 mu1 <- c(0, 0) Sigma1 <- matrix(c(2, 1, 1, 5), 2, 2) n2 <- 100 mu2 <- c(10, 0) Sigma2 <- matrix(c(5, -1, -1, 2), 2, 2) projDir <- c(1, 0) muMat <- rbind(mu1, mu2) SigmaArray <- array(0, c(2, 2, 2)) SigmaArray[, , 1] <- Sigma1 SigmaArray[, , 2] <- Sigma2 a <- getSepProjTheory( muMat = muMat, SigmaArray = SigmaArray, iniProjDirMethod = "SL") # separation index for cluster distributions 1 and 2 a$sepValMat[1, 2] # projection direction for cluster distributions 1 and 2 a$projDirArray[1, 2, ] library(MASS) y1 <- mvrnorm(n1, mu1, Sigma1) y2 <- mvrnorm(n2, mu2, Sigma2) y <- rbind(y1, y2) cl <- rep(1:2, c(n1, n2)) b <- getSepProjData( y = y, cl = cl, iniProjDirMethod = "SL", projDirMethod = "newton") # separation index for clusters 1 and 2 b$sepValMat[1, 2] # projection direction for clusters 1 and 2 b$projDirArray[1, 2, ]
  • Maintainer: Weiliang Qiu
  • License: GPL (>= 2)
  • Last published: 2023-08-16

Useful links