Better parallel than serial
In one of my past projects, I implemented the usage of multiple CPUs for time-intensive phylogenetic computations. I hereby used the R library Rmpi for the coordination of parallel processes. Rmpi is an R wrapper for the MPI communications protocol. The speed gains that my code execution was experiencing through this process parallelization was considerable, and it may, hence, be of interest to a wider audience. I therefore present here a small example of using Rmpi in phylogenetic investigations.
First, let us set up a set of calculations that warrant process parallelization. In a maximum parsimony tree search, for example, multiple starting trees may wished to be evaluated. A parsimony ratchet can be executed in parallel on these starting trees, as the runs are independent of each another.
library(ape)
library(phangorn)
data = read.phyDat("alignment.phy") # Alignment of 47 taxa with 1,000,000 bp length
trees = read.tree("trees.tre") # Set of 500 phylogenetic trees
myFunc = function(tree,data) {
return(phangorn::pratchet(data,start=tree))
}
Second, let us execute the parsimony ratchet serially (where an MP analysis is executed with the first starting tree, then the second, then the third, etc.). As we can see, MP inference performed serially on my data set takes roughly 10 min.
tme=proc.time()
sout=lapply(trees,myFunc,data)
proc.time()-tme
user system elapsed
584.513 1.070 584.977
Third, let us execute the parsimony ratchet in parallel, using the maximum number of CPUs available on my system (nproc = 8). As we can see, the MP inference performed in parallel on my data set takes roughly 4 min.
# Infer number of processors (i.e. CPUs) on system
nproc = as.numeric(system("nproc",intern=T))
# Specify one master and nproc-1 slaves
mpi.spawn.Rslaves(nslaves=nproc-1)
# Pass various functions to the Rmpi slaves
mpi.bcast.Robj2slave(is.rooted)
mpi.bcast.Robj2slave(is.binary.tree)
mpi.bcast.Robj2slave(unroot)
mpi.bcast.Robj2slave(stree)
tme=proc.time()
# Parallel apply the parsimony ratchet function on all starting trees;
# use load balancing for CPU assignment
pout=mpi.parLapply(trees,myFunc,data)
proc.time()-tme
# Close slaves
mpi.close.Rslaves()
user.self sys.self elapsed user.child sys.child
129.244 93.736 224.178 0.000 0.000
Utilizing all 8 CPUs on my computer (one for the master process, the rest for the slave processes) is not necessarily the most efficient parallelization. Spawning each slave process requires a certain amount of CPU resources, and these resources may be larger than the benefit of the added calculation speed (i.e. law of diminishing returns).