kdr function

Kernel density ridge estimation

Kernel density ridge estimation

Kernel density ridge estimation for 2- to 3-dimensional data.

kdr(x, y, H, p=1, max.iter=400, tol.iter, segment=TRUE, k, kmax, min.seg.size, keep.path=FALSE, gridsize, xmin, xmax, binned, bgridsize, w, fhat, density.cutoff, pre=TRUE, verbose=FALSE) kdr.segment(x, k, kmax, min.seg.size, verbose=FALSE) ## S3 method for class 'kdr' plot(x, ...)

Arguments

  • x: matrix of data values or an object of class kdr

  • y: matrix of initial values

  • p: dimension of density ridge

  • H: bandwidth matrix/scalar bandwidth. If missing, Hpi(x,deriv,order=2) is called by default.

  • max.iter: maximum number of iterations. Default is 400.

  • tol.iter: distance under which two successive iterations are considered convergent. Default is 0.001*min marginal IQR of x.

  • segment: flag to compute segments of density ridge. Default is TRUE.

  • k: number of segments to partition density ridge

  • kmax: maximum number of segments to partition density ridge. Default is 30.

  • min.seg.size: minimum length of a segment of a density ridge. Default is round(0.001*nrow(y),0).

  • keep.path: flag to store the density gradient ascent paths. Default is FALSE.

  • gridsize: vector of number of grid points

  • xmin,xmax: vector of minimum/maximum values for grid

  • binned: flag for binned estimation.

  • bgridsize: vector of binning grid sizes

  • w: vector of weights. Default is a vector of all ones.

  • fhat: kde of x. If missing kde(x=x,w=w) is executed.

  • density.cutoff: density threshold under which the y are excluded from the density ridge estimation. Default is contourLevels(fhat, cont=99).

  • pre: flag for pre-scaling data. Default is TRUE.

  • verbose: flag to print out progress information. Default is FALSE.

  • ...: other graphics parameters

Returns

A kernel density ridge set is an object of class kdr

which is a list with fields: - x,y: data points - same as input

  • end.points: matrix of final iterates starting from y

  • H: bandwidth matrix

  • names: variable names

  • tol.iter,tol.clust,min.seg.size: tuning parameter values - same as input

  • binned: flag for binned estimation

  • names: variable names

  • w: vector of weights

  • path: list of density gradient ascent paths where path[[i]] is the path of y[i,] (only if keep.path=TRUE)

Details

Kernel density ridge estimation is based on reduced dimension kernel mean shift. See Ozertem & Erdogmus (2011).

If y is missing, then it defaults to the grid of size gridsize spanning from xmin to xmax.

If the bandwidth H is missing, then the default bandwidth is the plug-in selector for the density gradient Hpi(x,deriv.order=2). Any bandwidth that is suitable for the density Hessian is also suitable for the kernel density ridge.

kdr(, segment=TRUE) or kdr.segment() carries out the segmentation of the density ridge points in end.points. If k is set, then k segments are created. If k is not set, then the optimal number of segments is chosen from 1:kmax, with kmax=30 by default. The segments are created via a hierarchical clustering with single linkage. Experimental: following the segmentation, the points within each segment are ordered to facilitate a line plot in plot(, type="l"). The optimal ordering is not always achieved in this experimental implementation, though a scatterplot plot(, type="p") always suffices, regardless of this ordering.

References

Ozertem, U. & Erdogmus, D. (2011) Locally defined principal curves and surfaces, Journal of Machine Learning Research, 12 , 1249-1286.

Examples

data(cardio) set.seed(8192) cardio.train.ind <- sample(1:nrow(cardio), round(nrow(cardio)/4,0)) cardio2 <- cardio[cardio.train.ind,c(8,18)] cardio.dr2 <- kdr(x=cardio2, gridsize=c(21,21)) ## gridsize=c(21,21) is for illustrative purposes only plot(cardio2, pch=16, col=3) plot(cardio.dr2, cex=0.5, pch=16, col=6, add=TRUE) ## Not run: cardio3 <- cardio[cardio.train.ind,c(8,18,11)] cardio.dr3 <- kdr(x=cardio3) plot(cardio.dr3, pch=16, col=6, xlim=c(10,90), ylim=c(70,180), zlim=c(0,40)) plot3D::points3D(cardio3[,1], cardio3[,2], cardio3[,3], pch=16, col=3, add=TRUE) library(maps) data(quake) quake <- quake[quake$prof==1,] ## Pacific Ring of Fire quake$long[quake$long<0] <- quake$long[quake$long<0] + 360 quake <- quake[, c("long", "lat")] data(plate) ## tectonic plate boundaries plate <- plate[plate$long < -20 | plate$long > 20,] plate$long[plate$long<0 & !is.na(plate$long)] <- plate$long[plate$long<0 & !is.na(plate$long)] + 360 quake.dr <- kdr(x=quake, xmin=c(70,-70), xmax=c(310, 80)) map(wrap=c(0,360), lty=2) lines(plate[,1:2], col=4, lwd=2) plot(quake.dr, type="p", cex=0.5, pch=16, col=6, add=TRUE) ## End(Not run)