Friday, 18 December 2015

Lessons learned in BEAST

At the moment, I am correcting an introduction to one of my chapters, analysing some ecology data, and running BEAST analyses in the hope that I will get nice, convergent trees. If you do lots of things at once, you do them slowly, especially if you are staying with your parents for a bit. But I have finally managed to sort out my CO1 tree so that the analysis converges and I don't get negative branch lengths (which look like trapdoor spiders are evolving backwards - try removing partitioning or using a random clock to remedy it). Now onto CytB.

I have kept a diary of hints and tricks about BEAST that I have learned along the way. It is now illustrated with many and varied trace plots.


Running BEAST takes a long time, and you don't want a power cut during the process (though sometimes putting your computer to sleep pauses the run). To start with, I run only 10,000,000 generations and try to make that work. You can tell if an analysis has worked because your trace plot for the posterior prior looks like a fuzzy caterpillar that has been straightened out (i.e. it isn't moving in any general direction other than left to right). The tighter the fuzziness, the better. The ESS values (estimated sample size) are >200 for every prior. And the tree looks like a phylogenetic tree, rather than a straight line, an invisible tree, or a weeping willow.

The first thing to do to a set of sequences that have been aligned (and put in the correct reading frame, if needed) is to test for appropriate substitution models using JModeltest. There are many different model testers, but JModeltest is very good, versatile and thorough, and I have investigated Beast 2 (which can model test using Bayesian algorithms rather than maximum likelihood) but ultimately came back to JModeltest because of its simplicity and suitability for answering my particular questions. Then, one takes one's sequence alignment and tests for partitioning (if the alignment has no gaps) using PartitionFinder (which is dead easy to use as long as you can work Java, which in my experience likes to play up a lot). If an alignment requires partitioning, it means that one or more bases in each codon evolves under different rules and times than the other bases. You can edit your alignment in a NEXUS file using the charset command to specify partitioning for Beauti.

The program BEAST is only for analysis. You set up the XML file in Beauti to analyse in BEAST. Of course, PartitionFinder will tell you that your dataset evolves under different models than what JModeltest says, because it is testing each of the three bases in the codons separately. I have wracked my brains over how to deal with this, because JModeltest acts as though all the base pairs evolve under the same model, so it averages the models out. There are lots of solutions to try: select the model output by JModeltest for all partitions; select the models output by PartitionFinder for their respective partitions, or select one of each. You can always try all of these options, but the difference is usually slight because the models are usually very similar and BEAST is very clever. I find that selecting the models output by PartitionFinder works best first, but if there are important parameters estimated by JModeltest, you can put those into Beauti under the Priors tab (but don't at first; you may not need to).

When first running the analysis, set only the model that you are using (under "sites"). Do not input any of the values for parameters estimated by JModeltest. You can limit the obscurity of the model by restricting the number of models that JModeltest tests for to 40 (select 5 under "number of substitution schemes"). Then you will only get models that can be implemented easily in BEAST (this is lazy though - you should really test more thoroughly and edit the BEAST XML file, but it doesn't make a huge difference and it takes people like me a very very long time to work out what needs changing, though it's easy enough when you work out how, but then BEAST runs for a few generations and crashes and you can't work out where you made a mistake).

When first running the analysis, set the clock to lognormal relaxed.

Set the MCMC chain to run for 10,000,000 generations.

If you have an outgroup, enforce it under the Taxa tab.

Then run it in BEAST!

When the run is complete, check the trace file. First check the ucld.stdev prior - if the mean of the estimates is more than one, then a strict clock is not appropriate (in other words, continue with the relaxed clock for now, or try a random clock). If the mean is <1, try a strict clock. If you need to try another clock, do it now and don't change anything else. See what difference (if any) it makes to your ESS.

The ESSs are good and the posterior caterpillar is heading in the right direction, but it appears to be melting in parts which isn't ideal.
This ucld.stdev mean is <1, so I tried random and strict clocks (neither of which improved the ESS, so I stuck with a lognormal relaxed clock - for now).


If you still have low ESS, change the offending priors to normal distributions around the mean, and the mean value should be the one estimated by JModeltest. Run it again. Change one thing at a time only, each time you run the analysis. Checking or unchecking the "Estimate" box next to where you set the clock also makes a huge difference to ESS.

This is a summary of what I have learned so far, in the last few weeks, about BEAST. My learning is a work in progress, so the above is hardly a substitute for advice from people who know what they are talking about. Try it at your own risk! It is a tricky program and help is not easy to find unless you can interpret computer speak (a useful skill in evolutionary biology). But it is very clever and very useful. Also, while an analysis is running, it's like waiting for Father Christmas to come.

If you're reading this because you need some tips with BEAST, message me if you want me to clarify anything (I tried to keep the post short!). If you're reading this for fun...well, surely not.

Good luck, merry Christmas, and I will see you in the new year!