Michael Grünstäudl (Gruenstaeudl), PhD

Researcher at the Freie Universität Berlin

Teaching in spring 2018 – Part II

Teaching in spring 2018

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Teaching in spring 2018 – Part I

In the lab with teachers-to-be

One of the master-level courses I teach at the Freie Universität Berlin during spring semester, the NatLab Evolution, is geared towards a comprehensive teaching education for upcoming high-school teachers.

Teaching in spring 2018

Teaching in spring 2018

 

 

 

 

 

 

 

 

 

 

One-liner: Interleaved to deinterleaved FASTA

Quick, de-interleave it!
There are one-liners that never get old. This is one of them.

$ perl -MBio::SeqIO -e 
'my $seqin = Bio::SeqIO->
 new(-fh => \*STDIN, -format => 'fasta');
 while (my $seq = $seqin-> next_seq)
 { print ">", $seq-> id, "\n", $seq-> seq, "\n"; }'
< interleaved.fasta > deinterleaved.fasta

 

Update 28-Jan-2019:
Over time, I came to find working with BioPerl uncomfortable, as its clean installation is just not well-supported on Linux. Thus, I have found myself relying on this method more and more:

$ awk '/^>/ {printf("\n%s\n",$0);next; } 
{ printf("%s",$0);}  END {printf("\n");}' 
< interleaved.fast | tail -n +2 
> deinterleaved.fasta

Few-liner: Batch download of DNA sequences from NCBI

The wonders of entrez

Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt.

$ cat input.txt
  Liriope_muscari_USACult,JX080424
  Dracaena_adamii_IVORYCOAST,JX080436
  ...

Here is how I did it:

$ INF=input.txt
$ for line in $(cat $INF); do
    SEQNAME=$(echo "$line" | awk -F',' '{print $1}')
    ACCNUM=$(echo "$line" | awk -F',' '{print $2}')
    FULLNAM=$(echo ">${SEQNAME}_${ACCNUM}")
    SEQ=$(esearch -db nucleotide -query "$ACCNUM" | efetch -format fasta | tail -n +2)
    echo -e "$FULLNAM\n$SEQ" >> out.txt
  done

Bioinformatic spring cleaning – Part II

An improved few-liner to keep the data compressed

If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you:

for file in $(LANG=C find . -size +1G -type f -print); do
    if [[ ! $file == *.gzip ]]; then
    gzip $file
    fi;
done

Bioinformatic spring cleaning – Part I

A short one-liner to keep the data compressed

One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders.

for i in $(ls -d */); do
    tar czf ${i%%/}.tar.gz $i && rm -r $i;
done

First signs of spring 2018

Galanthus nivalis (Amaryllidaceae) in Berlin in March 2018

 

 

 

 

 

 

 

 

 

 

Workshop at the GfBS 2018

Talking about efficient data partitioning strategies

Today, I held a workshop at the 19th annual meeting of the Gesellschaft für Biologische Systematik (GfBS). I introduced the participants to computational strategies to automate the selection of data partitioning schemes and nucleotide substitution models for phylogenomic datasets.

Workshop at the GfBS 2018

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Happy first class day 2018

Michael Gruenstaeudl - First Lecture in 2018

Michael Gruenstaeudl – First Lecture in 2018

 

Alignment Phy2Nex few-liner

Alignment file format conversion for the efficient – Part II

Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved).

In February 2017, I had already written a few-liner to conduct the inverse conversion (nexus to phylip) and was, thus, surprised to find that the conversion from phylip to non-interleaved nexus did not work out of the box. Instead, a few more lines (and a little trick using StringIO()) were necessary to get this specific conversion to work.

#!/usr/bin/env python2.7

import os
import sys
from Bio import AlignIO
from Bio.Alphabet import IUPAC, Gapped
from Bio.Nexus import Nexus
from StringIO import StringIO

inFn = sys.argv[1]
outFn= os.path.splitext(inFn)[0]+".nex"

inp = open(inFn, 'rU')
outp = open(outFn, 'w')

alphabet = Gapped(IUPAC.ambiguous_dna)
aln = AlignIO.parse(inp, 'phylip-relaxed', alphabet=alphabet)

out_handle = StringIO()
AlignIO.write(aln, out_handle, 'nexus')

p = Nexus.Nexus()
p.read(out_handle.getvalue())
p.write_nexus_data(outp, interleave=False)

outp.close()
inp.close()

And for those who wish to apply the above Python code (saved as „phy2nex.py„) to a collection of directories which contain a phylip-file each:

for dir in */; do python2 phy2nex.py $dir*.phy; done