Finds the optimal updating parameters to be used for the PCHA algorithm
Finds the optimal updating parameters to be used for the PCHA algorithm
After creating a grid on the space of (mu_up, mu_down) it runs archetypal by using a given method & other running options passed by ellipsis (...) and finally finds those values which minimize the SSE at the end of testing_iters iterations (default=10).
method: The method that will be used for computing initial approximation:
projected_convexhull, see find_outmost_projected_convexhull_points
convexhull, see find_outmost_convexhull_points
partitioned_convexhull, see find_outmost_partitioned_convexhull_points
furthestsum, see find_furthestsum_points
outmost, see find_outmost_points
random, a random set of kappas points will be used
testing_iters: The maximum number of iterations to run for every pair (mu_up, mu_down) of parameters
nworkers: The number of logical processors that will be used for parallel computing (usually it is the double of available physical cores)
nprojected: The dimension of the projected subspace for find_outmost_projected_convexhull_points
npartition: The number of partitions for find_outmost_partitioned_convexhull_points
nfurthest: The number of times that FurthestSum algorithm will be applied
sortrows: If it is TRUE, then rows will be sorted in find_furthestsum_points
mup1: The minimum value of mu_up, default is 1.1
mup2: The maximum value of mu_up, default is 2.5
mdown1: The minimum value of mu_down, default is 0.1
mdown2: The maximum value of mu_down, default is 0.5
nmup: The number of points to be taken for [mup1,mup2], default is 10
nmdown: The number of points to be taken for [mdown1,mdown2]
rseed: The random seed that will be used for setting initial A matrix. Useful for reproducible results
plot: If it is TRUE, then a 3D plot for (mu_up, mu_down, SSE) is created
...: Other arguments to be passed to function archetypal
Returns
A list with members:
mu_up_opt, the optimal found value for muAup and muBup
mu_down_opt, the optimal found value for muAdown and muBdown
min_sse, the minimum SSE which corresponds to (mu_up_opt,mu_down_opt)
seed_used, the used random seed, absolutely necessary for reproducing optimal results
method_used, the method that was used for creating the initial solution
sol_initial, the initial solution that was used for all grid computations
testing_iters, the maximum number of iterations done by every grid computation
Examples
{data("wd25")out = find_pcha_optimal_parameters(df = wd25, kappas =5, rseed =2020)# Time difference of 30.91101 secs# mu_up_opt mu_down_opt min_sse # 2.188889 0.100000 4.490980 # Run now given the above optimal found parameters:aa = archetypal(df = wd25, kappas =5, initialrows = out$sol_initial, rseed = out$seed_used, muAup = out$mu_up_opt, muAdown = out$mu_down_opt, muBup = out$mu_up_opt, muBdown = out$mu_down_opt)aa[c("SSE","varexpl","iterations","time")]# $SSE# [1] 3.629542# # $varexpl# [1] 0.9998924# # $iterations# [1] 146# # $time# [1] 21.96# Compare it with a simple solution (time may vary)aa2 = archetypal(df = wd25, kappas =5, rseed =2020)aa2[c("SSE","varexpl","iterations","time")]# $SSE# [1] 3.629503# # $varexpl# [1] 0.9998924# # $iterations# [1] 164# # $time# [1] 23.55## Of course the above was a "toy example", if your data has thousands or million rows,## then the time reduction is much more conspicuous.# Close plot device:dev.off()}