Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

Talk at evolutionary plant biology conference

Talking about novel bioinformatic tools for DNA sequence submissions

This Thursday, I held a conference talk at the 24th International Symposium on Biodiversity and Evolutionary Biology of the German Botanical Society. I introduced the participants to some of my newly developed tools for streamlining and automating the submssion of plant DNA barcoding sequences to public sequence repositories. This conference was a wonderful example of how small conferences can both meet high scientific standards and be an enjoyable reprieve for the participants. Lots of interested talks and a great social programme among the gorgeous Carinthian scenery!

Talk at DBG Sektionstagung 2018 in Klagenfurt, Kaertnen

Talk at DBG Sektionstagung 2018 in Klagenfurt, Kaertnen

 

 

 

 

 

 

 

 

 

 

New paper – Bioinformatic Workflows for Generating Complete Plastid Genome Sequences

In science, standardization and repeatability is a must.

Together with two other scientists, I just published a new paper on bioinformatic workflows for generating complete plastid genome sequences in the context of plastid phylogenomics of the water-lily clade. We demonstrate that standardization and repeatability are essential elements for modern plant phylogenomics and how such standardization and repeatability can be achieved efficiently during plastid genome assembly, annotation and alignment.

 

One-liner: Splitting multi-sequence FASTA into single-sequence FASTA

Quick, split it!
There are one-liners that never get old. Here is another one of them.

$ csplit multisequence.fasta /\>/ {*} && 
find . -size  0 -print0 |xargs -0 rm – 

Teaching in spring 2018 – Part II

Teaching in spring 2018

 

 

 

 

 

 

 

 

 

 

 

 

 

 

One-liner: Interleaved to deinterleaved FASTA

Quick, de-interleave it!
There are one-liners that never get old. This is one of them.

$ perl -MBio::SeqIO -e 
'my $seqin = Bio::SeqIO->
 new(-fh => \*STDIN, -format => 'fasta');
 while (my $seq = $seqin-> next_seq)
 { print ">", $seq-> id, "\n", $seq-> seq, "\n"; }'
< interleaved.fasta > deinterleaved.fasta

 

Update 28-Jan-2019:
Over time, I came to find working with BioPerl uncomfortable, as its clean installation is just not well-supported on Linux. Thus, I have found myself relying on this method more and more, assuming that the line breaks of the input file are LF (and not CRLF):

$ INF=interleaved.fasta
awk '/^>/ {printf("\n%s\n",$0);next; } \
{ printf("%s",$0);}  END {printf("\n");}' \
< $INF | tail -n +2 \
> ${INF%.fasta*}_deint.fasta

Few-liner: Batch download of DNA sequences from NCBI

The wonders of entrez

Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt.

$ cat input.txt
  Liriope_muscari_USACult,JX080424
  Dracaena_adamii_IVORYCOAST,JX080436
  ...

Here is how I did it:

$ INF=input.txt
$ for line in $(cat $INF); do
    SEQNAME=$(echo "$line" | awk -F',' '{print $1}')
    ACCNUM=$(echo "$line" | awk -F',' '{print $2}')
    FULLNAM=$(echo ">${SEQNAME}_${ACCNUM}")
    SEQ=$(esearch -db nucleotide -query "$ACCNUM" | efetch -format fasta | tail -n +2)
    echo -e "$FULLNAM\n$SEQ" >> out.txt
  done

Bioinformatic spring cleaning – Part II

An improved few-liner to keep the data compressed

If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you:

for file in $(LANG=C find . -size +1G -type f -print); do
    if [[ ! $file == *.gzip ]]; then
    gzip $file
    fi;
done

Bioinformatic spring cleaning – Part I

A short one-liner to keep the data compressed

One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders.

for i in $(ls -d */); do
    tar czf ${i%%/}.tar.gz $i && rm -r $i;
done

First signs of spring 2018

Galanthus nivalis (Amaryllidaceae) in Berlin in March 2018

 

 

 

 

 

 

 

 

 

 

Workshop at the GfBS 2018

Talking about efficient data partitioning strategies

Today, I held a workshop at the 19th annual meeting of the Gesellschaft für Biologische Systematik (GfBS). I introduced the participants to computational strategies to automate the selection of data partitioning schemes and nucleotide substitution models for phylogenomic datasets.

Workshop at the GfBS 2018