Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

Loading BEAST node statistics in R

The issue with hardcoding data formats

Various phylogenetic analyses in R work on the ultrametric trees generated by the popular software BEAST. In order to load such trees (including their node metadata) into R, the R packages phyloch and, more recently, ips provide useful tree and data parsers. Specifically, the parsing of the input trees is conducted with function read.beast(), which reads NEXUS files as saved by the BEAST components (e.g., TreeAnnotator) or related software.

Recently, I realized that node metadata can only be loaded with the above-mentioned R function if the tree name in the NEXUS tree definition line starts with “TREE1″. This specification is hard-coded in the dependent function extractBEASTstats(), which parses the node metadata from each tree.

See lines 2 and 3 of function extractBEASTstats():

X <- X[grep("tree TREE1[[:space:]]+=", X)]
X <- gsub("tree TREE1[[:space:]]+= \\[&R\\] ", "", X)

I am not certain if there is a convention that all tree names in BEAST-generated NEXUS files must start with the string “TREE1″. However, future data parsers for BEAST should probably avoid this specific condition, as successful parsing should not depend on a specific tree name.

Der Beitrag wurde am Tuesday, den 9. August 2016 um 12:33 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Captcha
Refresh
Hilfe
Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.