Alumnus of Freie Universität Berlin – Michael Grünstäudl, PhD » Blog Archiv » Setting burn-in and combining posterior tree distributions using awk and sed

Setting burn-in and combining posterior tree distributions using awk and sed

Efficiency on the UNIX shell

I often find myself manually removing a set of phylogenetic trees from a posterior tree distribution in order to set a burn-in and then combining the post-burnin trees of the individual runs. This action can be done very efficiently using awk on a UNIX shell:

inf1=Mrbayes_test.run1.t
inf2=Mrbayes_test.run2.t

tmpf1=${inf1%.t*}_postBurnin.tre
tmpf2=${inf2%.t*}_postBurnin.tre
outf=${inf1%.run1.t*}_combined_postBurnin.tre

tac $inf1 | awk '$1 =="tree" && ++counter<=500 
  {print; next} $1 !="tree"' | tac > $tmpf1
tac $inf2 | awk '$1 =="tree" && ++counter<=500 
  {print; next} $1 !="tree"' | tac > $tmpf2
sed '/end;/r'<(grep -vFf "$tmpf1" "$tmpf2") $tmpf1 | 
  grep -v "end;" | sed '$ a\end;\' > $outf