I don't have a lot of opportunities to work with phylogenies in my own line of work, though my training in Paris was very heavily phylogenetically-oriented, so it was fun to finaly make use of it.
One particular thing I did for this paper was to try taking into account track polymorphism into the ancestral state reconstruction, given the data we used had, for some species, a large amount of specimens measured, and the distribution of the measurements showed quite a spread (ranging from juveniles to adult forms) and the occasional bimodal set-up. The idea I had to account for this was simply to bootstrap these data, meaning that instead of simply using the mean measurement, or median measurement, for each ichnospecies, I picked a single species randomly representing each species, and reiterated the operation a large amount of time. Here the code in R (full code for the entire study is available here):
#Bootstrap based on polymorphism
n_trials <- 1e4
n_char <- length(all_char) #all_char contains the list of characters for which we want to run the ASR
# This is fairly time consuming so to make things smoother I used doSNOW to parallelize it.
# The following 7 lines are only there to set it up.
library(doSNOW)
cl <- parallel::makeCluster(2)
registerDoSNOW(cl)
pb <- txtProgressBar(max = n_trials, style = 3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress = progress)
set.seed(20180822)
aas <- foreach(i = seq_len(n_trials), .options.snow = opts) %dopar% {
R <- list() # will contain the results of the 1e4 x n_char x n_tree ASR.
for(j in 1:n_char){
R[[j]]<-list()
# Pick a random specimen for each run and get the value for the specific measurement.
# First get the list of specimens for which that particular character was measured:
sp <- dat$specimens[!is.na(dat$specimens[,colnames(dat$specimens)==all_char[j]]),]
# Then select randomly a single specimen for each ichnospecies:
specimen_set <- sapply(split(sp$`No Specimen`,sp$`No Species`),function(x)ifelse(length(x)>1,sample(x,1),x))
# Then pick its corresponding measurement:
values <- sapply(specimen_set,function(x)sp[sp$`No Specimen`==x,colnames(sp)==all_char[j]])
# We tried several phylogenetic hypotheses here
for(k in 1:n_tree){
# Maximum likelihood ASR
R[[j]][[k]] <- ape::reconstruct(values,dat$trees[[k]],method="ML")
}
}
R
}
close(pb)
stopCluster(cl)
It gives a nice idea of the incertitude linked to the set of specimens measured that is introduced by ASR.
Anyway, the paper is naturally about way more than just this (in fact it is a very minor point of that paper) but I thought it was a cool solution to that particular issue.
Reference:
Buchwitz M., Jansen M., Renaudie J., Marchetti L., Voigt S. (2021). Evolutionary Change in Locomotion Close to the Origin of Amniotes Inferred From Trackway Data in an Ancestral State Reconstruction Approach. Frontiers in Ecology and Evolution, 9:674779.