Alignment file format conversion for the efficient – Part II
Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved).
In February 2017, I had already written a few-liner to conduct the inverse conversion (nexus to phylip) and was, thus, surprised to find that the conversion from phylip to non-interleaved nexus did not work out of the box. Instead, a few more lines (and a little trick using StringIO()) were necessary to get this specific conversion to work.
#!/usr/bin/env python2.7
import os
import sys
from Bio import AlignIO
from Bio.Alphabet import IUPAC, Gapped
from Bio.Nexus import Nexus
from StringIO import StringIO
inFn = sys.argv[1]
outFn= os.path.splitext(inFn)[0]+".nex"
inp = open(inFn, 'rU')
outp = open(outFn, 'w')
alphabet = Gapped(IUPAC.ambiguous_dna)
aln = AlignIO.parse(inp, 'phylip-relaxed', alphabet=alphabet)
out_handle = StringIO()
AlignIO.write(aln, out_handle, 'nexus')
p = Nexus.Nexus()
p.read(out_handle.getvalue())
p.write_nexus_data(outp, interleave=False)
outp.close()
inp.close()
And for those who wish to apply the above Python code (saved as “phy2nex.py“) to a collection of directories which contain a phylip-file each:
for dir in */; do python2 phy2nex.py $dir*.phy; done