df: The n x d data frame that will be used for Archetypal Analysis
kappas: The number of archetypes
npr: The dimension of the projected subspaces. It can be npr = 1 (then there are d such subspaces), or npr > 1 (then we have C(d,npr) different subspaces)
rseed: An integer to be used for the random seed if it will be necessary
doparallel: If it is set to TRUE, then parallel processing will be performed. That is absolutely required if n is very large and d>6.
nworkers: The number of logical processors that will be used for computing the projected convex hulls, which they are always C(d,npr).
uniquerows: If it is set to TRUE, then unique rows will be used for computing distance matrix and less resources will be needed.
Returns
A list with members:
outmost, the first kappas most frequent outermost points as rows of data frame
outmostall, all the outermost points that have been found as rows of data frame
outmostfrequency, a matrix with frequency and cumulative frequency for outermost rows
usedrandom, an integer of randomly chosen rows, if it was necessary to complete the number of kappas rows
chprojections, all the Convex Hulls of the different C(d,npr) projections, i.e. the coordinate projection subspaces
projected, a data frame with rows the unique points that have been projected in order to create the relevant Convex Hulls of coordinate projection subspaces
Details
If npr = 1, then Convex Hull is identical with the range (min,max) for the relevant variable, otherwise the function uses the chull when npr = 2 and the convhulln
for npr > 2. See [1] and [2] respectively for more details.
First all available projections are being considered and their Convex Hull are being computed. Then either the unique (if uniquerows = TRUE) or all (if uniquerows = FALSE) associated data rows form a matrix and finally by using dist we find the kappas most frequent outermost rows.
A special care is needed if the rows we have found are less than kappas. In that case, if a random sampling is necessary, the output usedrandoms informs us for the number of random rows and the rseed can be used for reproducibility.
Examples
#data("wd2")#2D demo df = wd2
yy = find_outmost_projected_convexhull_points(df, kappas =3)yy$outmost #the rows of 3 outmost projected convexhull pointsyy$outmostall #all outmost foundyy$outmostfrequency #frequency table for allyy$usedrandom #No random row was usedyy$chprojections #The Convex Hull of projection (one only here) yy$projected #the 9 unique points that created the one only CHdf[yy$outmost,]#the 3 outmost projected convexhull points#####data("wd3")#3D demo df = wd3
yy = find_outmost_projected_convexhull_points(df, kappas =4)yy$outmost #the rows of 4 outmost projected convexhull pointsyy$outmostall #all outmost foundyy$outmostfrequency #frequency table for allyy$usedrandom #No random row was usedyy$chprojections #All the Convex Hulls of projections top coordinate planesyy$projected #the 14 unique points that created all CHsdf[yy$outmost,]#the 4 outmost projected convexhull points#
References
[1] Eddy, W. F. (1977). Algorithm 523: CONVEX, A new convex hull algorithm for planar sets. ACM Transactions on Mathematical Software, 3, 411-412. doi: 10.1145/355759.355768.
[2] Barber, C.B., Dobkin, D.P., and Huhdanpraa, H.T., "The Quickhull algorithm for convex hulls" ACM Trans. on Mathematical Software, 22(4):469-483, Dec 1996, http://www.qhull.org