overfitRR() R function from [RRphylo]

Testing RRphylo overfit

Testing the robustness of RRphylo results to sampling effects and phylogenetic uncertainty.


overfitRR(RR,y, phylo.list, aces=NULL,x1=NULL, aces.x1=NULL, cov=NULL,
  rootV=NULL, clus=0.5, s = NULL, swap.args = NULL, nsim=NULL , trend.args =
  NULL, shift.args = NULL, conv.args = NULL, pgls.args = NULL)

Arguments

RR: an object produced by RRphylo.
y: a named vector of phenotypes.
phylo.list: a list (or multiPhylo) of alternative topologies (i.e. having the same species as the original tree arranged differently) to be tested.
aces: if used to produce the RR object, the vector of those ancestral character values at nodes known in advance must be specified. Names correspond to the nodes in the tree.
x1: the additional predictor to be specified if the RR object has been created using an additional predictor (i.e. multiple version of RRphylo). 'x1' vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running RRphylo on the predictor as well, and taking the vector of ancestral states and tip values to form the x1.
aces.x1: a named vector of ancestral character values at nodes for x1. It must be indicated if the RR object has been created using both aces and x1. Names correspond to the nodes in the tree.
cov: if used to produce the RR object, the covariate must be specified. As in RRphylo, the covariate vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running RRphylo on the covariate as well, and taking the vector of ancestral states and tip values to form the covariate.
rootV: if used to produce the RR object, the phenotypic value at the tree root must be specified.
clus: the proportion of clusters to be used in parallel computing. To run the single-threaded version of overfitRR set clus = 0.
s, swap.args, nsim: are deprecated. Check the function resampleTree to generate alterative phylogenies.
trend.args: is deprecated. Check the function overfitST

to test search.trend robustness.
shift.args: is deprecated. Check the function overfitSS

to test search.shift robustness.
conv.args: is deprecated. Check the function overfitSC to test search.conv robustness.
pgls.args: is deprecated. Check the function overfitPGLS

to test PGLS_fossil robustness.

Returns

The function returns a 'RRphyloList' object containing:

$RR.list a 'RRphyloList' including the results of each RRphylo performed within overfitRR.

$root.est the estimated root value per simulation.

$rootCI the 95% confidence interval around the root value.

$ace.regressions a 'RRphyloList' including the results of linear regression between ancestral state estimates before and after the subsampling.

The output always has an attribute "Call" which returns an unevaluated call to the function.

Details

Methods using a large number of parameters risk being overfit. This usually translates in poor fitting with data and trees other than the those originally used. With RRphylo methods this risk is usually very low. However, the user can assess how robust the results of RRphylo are by running resampleTree and overfitRR. The former is used to subsample the tree according to a s parameter (that is the proportion of tips to be removed from the tree) and to alter tree topology by means of swapONE. The list of altered topologies is fed to overfitRR, which cross-references each tree with the phenotypic data and performs RRphylo on them. Thereby, both the potential for overfit and phylogenetic uncertainty are accounted for straight away.

Otherwise, a list of alternative phylogenies can be supplied to overfitRR. In this case subsampling and swapping arguments are ignored, and robustness testing is performed on the alternative topologies as they are.

Examples


## Not run:

cc<- 2/parallel::detectCores()
library(ape)

## overfitRR routine
# load the RRphylo example dataset including Ornithodirans tree and data
data("DataOrnithodirans")
DataOrnithodirans$treedino->treedino
DataOrnithodirans$massdino->massdino
DataOrnithodirans$statedino->statedino

# extract Pterosaurs tree and data
extract.clade(treedino,746)->treeptero
massdino[match(treeptero$tip.label,names(massdino))]->massptero
massptero[match(treeptero$tip.label,names(massptero))]->massptero

# peform RRphylo on body mass
RRphylo(tree=treeptero,y=log(massptero),clus=cc)->RRptero

# generate a list of subsampled and swapped phylogenies to test
treeptero.list<-resampleTree(RRptero$tree,s = 0.25,swap.si = 0.1,swap.si2 = 0.1,nsim=10)

# test the robustness of RRphylo
ofRRptero<-overfitRR(RR = RRptero,y=log(massptero),phylo.list=treeptero.list,clus=cc)

## overfitRR routine on multiple RRphylo
# load the RRphylo example dataset including Cetaceans tree and data
data("DataCetaceans")
DataCetaceans$treecet->treecet
DataCetaceans$masscet->masscet
DataCetaceans$brainmasscet->brainmasscet
DataCetaceans$aceMyst->aceMyst

# cross-reference the phylogenetic tree and body and brain mass data. Remove from
# both the tree and vector of body sizes the species whose brain size is missing
drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi

# peform RRphylo on the variable (body mass) to be used as additional predictor
RRphylo(tree=treecet.multi,y=masscet.multi,clus=cc)->RRmass.multi
RRmass.multi$aces[,1]->acemass.multi

# create the predictor vector: retrieve the ancestral character estimates
# of body size at internal nodes from the RR object ($aces) and collate them
# to the vector of species' body sizes to create
c(acemass.multi,masscet.multi)->x1.mass

# peform RRphylo on brain mass by using body mass as additional predictor
RRphylo(tree=treecet.multi,y=brainmasscet,x1=x1.mass,clus=cc)->RRmulti

# generate a list of subsampled and swapped phylogenies to test
treecet.list<-resampleTree(RRmulti$tree,s = 0.25,swap.si=0.1,swap.si2=0.1,nsim=10)

# test the robustness of multiple RRphylo
ofRRcet<-overfitRR(RR = RRmulti,y=brainmasscet,phylo.list=treecet.list,clus=cc,x1 =x1.mass)
## End(Not run)

References

Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., Di Febbraro, M., & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9: 974-983.doi:10.1111/2041-210X.12954

Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019a) Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PLoS ONE, 14: e0210101. https://doi.org/10.1371/journal.pone.0210101

Author(s)

Silvia Castiglione, Carmela Serio, Giorgia Girardi, Pasquale Raia

RRphylo package Read PDF manual

Maintainer: Silvia Castiglione
License: GPL-2
Last published: 2025-03-23

Useful links

overfitRR function