Michael Grünstäudl (Gruenstaeudl), PhD

Researcher at the Freie Universität Berlin

Bioinformatic spring cleaning – Part II

An improved few-liner to keep the data compressed

If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you:

for file in $(LANG=C find . -size +1G -type f -print); do
    if [[ ! $file == *.gzip ]]; then
    gzip $file
    fi;
done

Bioinformatic spring cleaning – Part I

A short one-liner to keep the data compressed

One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders.

for i in $(ls -d */); do
    tar czf ${i%%/}.tar.gz $i && rm -r $i;
done

First signs of spring 2018

Galanthus nivalis (Amaryllidaceae) in Berlin in March 2018

 

 

 

 

 

 

 

 

 

 

Workshop at the GfBS 2018

Talking about efficient data partitioning strategies

Today, I held a workshop at the 19th annual meeting of the Gesellschaft für Biologische Systematik (GfBS). I introduced the participants to computational strategies to automate the selection of data partitioning schemes and nucleotide substitution models for phylogenomic datasets.

Workshop at the GfBS 2018

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Happy first class day 2018

Michael Gruenstaeudl - First Lecture in 2018

Michael Gruenstaeudl – First Lecture in 2018

 

Alignment Phy2Nex few-liner

Alignment file format conversion for the efficient – Part II

Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved).

In February 2017, I had already written a few-liner to conduct the inverse conversion (nexus to phylip) and was, thus, surprised to find that the conversion from phylip to non-interleaved nexus did not work out of the box. Instead, a few more lines (and a little trick using StringIO()) were necessary to get this specific conversion to work.

#!/usr/bin/env python2.7

import os
import sys
from Bio import AlignIO
from Bio.Alphabet import IUPAC, Gapped
from Bio.Nexus import Nexus
from StringIO import StringIO

inFn = sys.argv[1]
outFn= os.path.splitext(inFn)[0]+".nex"

inp = open(inFn, 'rU')
outp = open(outFn, 'w')

alphabet = Gapped(IUPAC.ambiguous_dna)
aln = AlignIO.parse(inp, 'phylip-relaxed', alphabet=alphabet)

out_handle = StringIO()
AlignIO.write(aln, out_handle, 'nexus')

p = Nexus.Nexus()
p.read(out_handle.getvalue())
p.write_nexus_data(outp, interleave=False)

outp.close()
inp.close()

And for those who wish to apply the above Python code (saved as „phy2nex.py„) to a collection of directories which contain a phylip-file each:

for dir in */; do python2 phy2nex.py $dir*.phy; done

Standardizing tRNA anticodon abbreviations in genome files

Better U than T (or the other way around)

Incongruence exists among the annotations of tRNAs in genomic files: Most authors use „U“ (for uracil) in the three-letter anticodon abbreviations of tRNA anntotations, but some use „T“ (for thymine). While I consider the usage of both letters as justified, I believe that this usage should be consistent within a single genome file.

To assist with such a standardization, I have written a small Python script (directly executable from the bash shell) that replaces all „U“s of an anticodon abbreviation with „T“s (or vice versa, if so desired). Simply change the file name from „tmp“ to the actual filename in the below script.

python - <<EOF
import re
with open("tmp", "r+") as f:

lines = f.read().splitlines()
outL = []
kw = "\ttrn" # Depending on file format, keyword can also be "=trn" (e.g., gff-files)
for l in lines:

if kw in l:

pos = [m.end()+2 for m in re.finditer(kw, l)]
for p in pos:

l = l[:p] + l[p:p+3].replace("U","T") + l[p+3:] # Replacement from U to T; switch letters if desired

outL.append(l)

else:

outL.append(l)

f.seek(0)
f.write('\n'.join(outL))
f.truncate()

EOF

Note:

P.S. Is indenting in HTML supposed to be that clumsy, with using different <p> </p> definitions for every indent level?

P.P.S. Indenting works nicely when using the tag <pre> </pre>, as the display mode seems to switch into verbatim. A big thank you to Chris B. for pointing this out to me.

Teaching award 2017

Awarded annually to a lecturer in biology, chemistry and pharmacy

I recently received the teaching award for biology at the FU Berlin for 2017. Many thanks to those students who recommended me. I appreciate your praise!

 

New paper – Conserved plastid genome structure in early-diverging angiosperms

Not so different after all.

Together with two other scientists, I just published a new paper on the plastid genome structure of early-diverging angiosperms. We demonstrate that the plastid genomes of early-diverging angiosperms are much more conserved than previously considered.

 

Teaching in spring

In the lab with teachers-to-be and high-school students

One of the master-level courses I teach at the Freie Universität Berlin during spring semester, the NatLab Evolution, is geared towards a comprehensive teaching education for upcoming high-school teachers. During this course, we invite high-school students from across Berlin to spend a day in the lab and work with the Master’s students on biological experiments. Given the general responses, both sides seem to enjoy this arrangement a lot!

Michael Gruenstaeudl – Teaching – June 2017

Michael Gruenstaeudl – Teaching – June 2017