mainVEM function

Adaptative VEM algorithm

Adaptative VEM algorithm

Principal adaptative VEM algorithm for histogram with model selection or for kernel method.

mainVEM(data, n, Qmin, Qmax = Qmin, directed = TRUE, sparse = FALSE, method = c("hist", "kernel"), init.tau = NULL, cores = 1, d_part = 5, n_perturb = 10, perc_perturb = 0.2, n_random = 0, nb.iter = 50, fix.iter = 10, epsilon = 1e-06, filename = NULL)

Arguments

  • data: Data format depends on the estimation method used!!

    1. Data with hist method - list with 2 components:

      • **dataTime:[0,dataTime**: [0,dataTime] is the total time interval of observation

      • **dataNijk:DatamatrixwithcountsperprocessNijk**: Data matrix with counts per process N_{ij}andsubintervals;matrixofsizeand sub-intervals ; matrix of sizeN*DmaxwherewhereN = n(n-1)ororn(n-1)/2isthenumberofpossiblenodepairsinthegraphandis the number of possible node pairs in the graph andDmax = 2^{dmax}$ is the size of the finest partition in the histrogram approach

         Counts are pre-computed - Obtained through function 'statistics' (auxiliary.R) on data with second format
        
    2. Data with kernel method - list with 3 components:

      • data$time.seq: Sequence of observed time points of the m-th event (M-vector)
      • data$type.seq: Sequence of observed values convertNodePair(i,j,n,directed) (auxiliary.R) of process that produced the mth event (M-vector).
      • **dataTime:[0,dataTime**: [0,dataTime] is the total time interval of observation
  • n: Total number of nodes

  • Qmin: Minimum number of groups

  • Qmax: Maximum number of groups

  • directed: Boolean for directed (TRUE) or undirected (FALSE) case

  • sparse: Boolean for sparse (TRUE) or not sparse (FALSE) case

  • method: List of string. Can be "hist" for histogram method or "kernel" for kernel method

  • init.tau: List of initial values of τ\tau - all tau's are matrices with size Q×nQ\times n (might be with different values of Q)

  • cores: Number of cores for parallel execution

    If set to 1 it does sequential execution

    Beware: parallelization with fork (multicore) : doesn't work on Windows!

  • d_part: Maximal level for finest partition of time interval [0,T] used for k-means initializations.

    • Algorithm takes partition up to depth 2d2^d with d=1,...,dpartd=1,...,d_{part}
    • Explore partitions [0,T],[0,T/2],[T/2,T],...[0,T/2d],...[(2d1)T/2d,T][0,T], [0,T/2], [T/2,T], ... [0,T/2^d], ...[(2^d-1)T/2^d,T]
    • Total number of partitions npart=2(dpart+1)1npart= 2^{(d_{part} +1)} -1
  • n_perturb: Number of different perturbations on k-means result

    When Qmin<QmaxQmin < Qmax, number of perturbations on the result with Q1Q-1 or Q+1Q+1 groups

  • perc_perturb: Percentage of labels that are to be perturbed (= randomly switched)

  • n_random: Number of completely random initial points. The total number of initializations for the VEM is npart(1+nperturb)+nrandomnpart*(1+n_{perturb}) +n_{random}

  • nb.iter: Number of iterations of the VEM algorithm

  • fix.iter: Maximum number of iterations of the fixed point into the VE step

  • epsilon: Threshold for the stopping criterion of VEM and fixed point iterations

  • filename: Name of the file where to save the results along the computation (increasing steps for QQ, these are the longest).

    The file will contain a list of 'best' results.

Details

The sparse version works only for the histogram approach.

Examples

# load data of a synthetic graph with 50 individuals and 3 clusters n <- 20 Q <- 3 Time <- generated_Q3_n20$data$Time data <- generated_Q3_n20$data z <- generated_Q3_n20$z step <- .001 x0 <- seq(0,Time,by=step) intens <- generated_Q3_n20$intens # VEM-algo kernel sol.kernel <- mainVEM(data,n,Q,directed=FALSE,method='kernel', d_part=0, n_perturb=0)[[1]] # compute smooth intensity estimators sol.kernel.intensities <- kernelIntensities(data,sol.kernel$tau,Q,n,directed=FALSE) # eliminate label switching intensities.kernel <- sortIntensities(sol.kernel.intensities,z,sol.kernel$tau, directed=FALSE) # VEM-algo hist # compute data matrix with precision d_max=3 Dmax <- 2^3 Nijk <- statistics(data,n,Dmax,directed=FALSE) sol.hist <- mainVEM(list(Nijk=Nijk,Time=Time),n,Q,directed=FALSE, method='hist', d_part=0,n_perturb=0,n_random=0)[[1]] log.intensities.hist <- sortIntensities(sol.hist$logintensities.ql,z,sol.hist$tau, directed=FALSE) # plot estimators par(mfrow=c(2,3)) ind.ql <- 0 for (q in 1:Q){ for (l in q:Q){ ind.ql <- ind.ql + 1 true.val <- intens[[ind.ql]]$intens(x0) values <- c(intensities.kernel[ind.ql,],exp(log.intensities.hist[ind.ql,]),true.val) plot(x0,true.val,type='l',xlab=paste0("(q,l)=(",q,",",l,")"),ylab='', ylim=c(0,max(values)+.1)) lines(seq(0,1,by=1/Dmax),c(exp(log.intensities.hist[ind.ql,]), exp(log.intensities.hist[ind.ql,Dmax])),type='s',col=2,lty=2) lines(seq(0,1,by=.001),intensities.kernel[ind.ql,],col=4,lty=3) } }

References

DAUDIN, J.-J., PICARD, F. & ROBIN, S. (2008). A mixture model for random graphs. Statist. Comput. 18, 173–183.

DEMPSTER, A. P., LAIRD, N. M. & RUBIN, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38.

JORDAN, M., GHAHRAMANI, Z., JAAKKOLA, T. & SAUL, L. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233.

MATIAS, C., REBAFKA, T. & VILLERS, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika.

MATIAS, C. & ROBIN, S. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. & Surveys 47, 55–74.