plot2DProjection function

PLOT A PAIR OF CLUSTERS ALONG A 2-D PROJECTION SPACE

PLOT A PAIR OF CLUSTERS ALONG A 2-D PROJECTION SPACE

Plot a pair of clusters along a 2-D projection space.

plot2DProjection( y1, y2, projDir, sepValMethod = c("normal", "quantile"), iniProjDirMethod = c("SL", "naive"), projDirMethod = c("newton", "fixedpoint"), xlim = NULL, ylim = NULL, xlab = "1st projection direction", ylab = "2nd projection direction", title = "Scatter plot of 2-D Projected Clusters", font = 2, font.lab = 2, cex = 1.2, cex.lab = 1, cex.main = 1.5, lwd = 4, lty1 = 1, lty2 = 2, pch1 = 18, pch2 = 19, col1 = 2, col2 = 4, alpha = 0.05, ITMAX = 20, eps = 1.0e-10, quiet = TRUE)

Arguments

  • y1: Data matrix of cluster 1. Rows correspond to observations. Columns correspond to variables.

  • y2: Data matrix of cluster 2. Rows correspond to observations. Columns correspond to variables.

  • projDir: 1-D projection direction along which two clusters will be projected.

  • sepValMethod: Method to calculate separation index for a pair of clusters projected onto a 1-D space. sepValMethod="quantile" indicates the quantile version of separation index will be used: sepVal=(L2U1)/(U2L1)sepVal=(L_2-U_1)/(U_2-L_1) where LiL_i and UiU_i, i=1,2i=1, 2, are the lower and upper alpha/2 sample percentiles of projected cluster ii. sepValMethod="normal" indicates the normal version of separation index will be used: c("sepVal=[(xbar2xbar1)zalpha/2(s1+s2)]/\nsepVal=[(xbar_2-xbar_1)-z_{\\alpha/2}(s_1+s_2)]/\n", "[(xbar2xbar1)+zalpha/2(s1+s2)][(xbar_2-xbar_1)+z_{\\alpha/2}(s_1+s_2)]"), where xbarixbar_i and sis_i are the sample mean and standard deviation of projected cluster ii.

  • iniProjDirMethod: Indicating the method to get initial projection direction when calculating the separation index between a pair of clusters (c.f. Qiu and Joe, 2006a, 2006b).

    iniProjDirMethod=SL indicates the initial projection direction is the sample version of the SL's projection direction (Su and Liu, 1993) (Σ1+Σ2)1(μ2μ1)\left(\boldsymbol{\Sigma}_1+\boldsymbol{\Sigma}_2\right)^{-1}\left(\boldsymbol{\mu}_2-\boldsymbol{\mu}_1\right)

    iniProjDirMethod=naive indicates the initial projection direction is μ2μ1\boldsymbol{\mu}_2-\boldsymbol{\mu}_1

  • projDirMethod: Indicating the method to get the optimal projection direction when calculating the separation index between a pair of clusters (c.f. Qiu and Joe, 2006a, 2006b).

    projDirMethod=newton indicates we use the Newton-Raphson method to search the optimal projection direction (c.f. Qiu and Joe, 2006a). This requires the assumptions that both covariance matrices of the pair of clusters are positive-definite. If this assumption is violated, the fixedpoint method could be used. The fixedpoint method iteratively searches the optimal projection direction based on the first derivative of the separation index to the project direction (c.f. Qiu and Joe, 2006b).

  • xlim: Range of X axis.

  • ylim: Range of Y axis.

  • xlab: X axis label.

  • ylab: Y axis label.

  • title: Title of the plot.

  • font: An integer which specifies which font to use for text (see par).

  • font.lab: The font to be used for x and y labels (see par).

  • cex: A numerical value giving the amount by which plotting text and symbols should be scaled relative to the default (see par).

  • cex.lab: The magnification to be used for x and y labels relative to the current setting of 'cex' (see par).

  • cex.main: The magnification to be used for main titles relative to the current setting of 'cex' (see par).

  • lwd: The line width, a positive number, defaulting to '1' (see par).

  • lty1: Line type for cluster 1 (see par).

  • lty2: Line type for cluster 2 (see par).

  • pch1: Either an integer specifying a symbol or a single character to be used as the default in plotting points for cluster 1 (see points).

  • pch2: Either an integer specifying a symbol or a single character to be used as the default in plotting points for cluster 2 (see points).

  • col1: Color to indicates cluster 1.

  • col2: Color to indicates cluster 2.

  • alpha: Tuning parameter reflecting the percentage in the two tails of a projected cluster that might be outlying.

  • ITMAX: Maximum iteration allowed when iteratively calculating the optimal projection direction. The actual number of iterations is usually much less than the default value 20.

  • eps: A small positive number to check if a quantitiy qq is equal to zero. If q<|q|<eps, then we regard qq as equal to zero. eps is used to check the denominator in the formula of the separation index is equal to zero. Zero-value denominator indicates two clusters are totally overlapped. Hence the separation index is set to be 1-1. The default value of eps is 1.0e101.0e-10.

  • quiet: A flag to switch on/off the outputs of intermediate results and/or possible warning messages. The default value is TRUE.

Details

To get the second projection direction, we first construct an orthogonal matrix with first column projDir. Then we rotate the data points according to this orthogonal matrix. Next, we remove the first dimension of the rotated data points, and obtain the optimal projection direction projDir2 for the rotated data points in the remaining dimensions. Finally, we rotate the vector projDir3=(0, projDir2) back to the original space. The vector projDir3 is the second projection direction.

The ticks along X axis indicates the positions of points of the projected two clusters. The positions of LiL_i and UiU_i, i=1,2i=1, 2, are also indicated on X axis, where LiL_i and UiU_i are the lower and upper α/2\alpha/2 sample percentiles of cluster ii if sepValMethod="quantile". If sepValMethod="normal", Li=xbarizα/2siL_i=xbar_i-z_{\alpha/2}s_i, where xbarixbar_i and sis_i are the sample mean and standard deviation of cluster ii, and zα/2z_{\alpha/2}

is the upper α/2\alpha/2 percentile of standard normal distribution.

Returns

  • sepValx: value of the separation index for the projected two clusters along the 1st projection direction.

  • sepValy: value of the separation index for the projected two clusters along the 2nd projection direction.

  • Q2: 1st column is the 1st projection direction. 2nd column is the 2nd projection direction.

References

Qiu, W.-L. and Joe, H. (2006a) Generation of Random Clusters with Specified Degree of Separaion. Journal of Classification, 23 (2), 315-334.

Qiu, W.-L. and Joe, H. (2006b) Separation Index and Partial Membership for Clustering. Computational Statistics and Data Analysis, 50 , 585--603.

Author(s)

Weiliang Qiu weiliang.qiu@gmail.com

Harry Joe harry@stat.ubc.ca

See Also

plot1DProjection

viewClusters

Examples

n1 <- 50 mu1 <- c(0,0) Sigma1 <- matrix(c(2, 1, 1, 5), 2, 2) n2 <- 100 mu2 <- c(10, 0) Sigma2 <- matrix(c(5, -1, -1, 2), 2, 2) projDir <- c(1, 0) library(MASS) set.seed(1234) y1 <- mvrnorm(n1, mu1, Sigma1) y2 <- mvrnorm(n2, mu2, Sigma2) y <- rbind(y1, y2) cl <- rep(1:2, c(n1, n2)) b <- getSepProjData( y = y, cl = cl, iniProjDirMethod = "SL", projDirMethod = "newton") # projection direction for clusters 1 and 2 projDir <- b$projDirArray[1,2,] par(mfrow = c(2,1)) plot1DProjection( y1 = y1, y2 = y2, projDir = projDir) plot2DProjection( y1 = y1, y2 = y2, projDir = projDir)
  • Maintainer: Weiliang Qiu
  • License: GPL (>= 2)
  • Last published: 2023-08-16

Useful links