Michael Grünstäudl (Gruenstaeudl), PhD

Talk at evolutionary plant biology conference

23. September 2018 von Michael Grünstäudl

Talking about novel bioinformatic tools for DNA sequence submissions

This Thursday, I held a conference talk at the 24^th International Symposium on Biodiversity and Evolutionary Biology of the German Botanical Society. I introduced the participants to some of my newly developed tools for streamlining and automating the submssion of plant DNA barcoding sequences to public sequence repositories. This conference was a wonderful example of how small conferences can both meet high scientific standards and be an enjoyable reprieve for the participants. Lots of interested talks and a great social programme among the gorgeous Carinthian scenery!

Talk at DBG Sektionstagung 2018 in Klagenfurt, Kaertnen

Kategorie Allgemein | 0 Kommentare »

New paper – Bioinformatic Workflows for Generating Complete Plastid Genome Sequences

21. June 2018 von Michael Grünstäudl

In science, standardization and repeatability is a must.

Together with two other scientists, I just published a new paper on bioinformatic workflows for generating complete plastid genome sequences in the context of plastid phylogenomics of the water-lily clade. We demonstrate that standardization and repeatability are essential elements for modern plant phylogenomics and how such standardization and repeatability can be achieved efficiently during plastid genome assembly, annotation and alignment.

Kategorie bioinformatics, scientific papers | 0 Kommentare »

One-liner: Splitting multi-sequence FASTA into single-sequence FASTA

15. June 2018 von Michael Grünstäudl

Quick, split it!
There are one-liners that never get old. Here is another one of them.

$ csplit multisequence.fasta /\>/ {*} && 
find . -size  0 -print0 |xargs -0 rm –

Kategorie Allgemein | 0 Kommentare »

Teaching in spring 2018 – Part II

11. June 2018 von Michael Grünstäudl

Teaching in spring 2018

Kategorie Allgemein | 0 Kommentare »

One-liner: Interleaved to deinterleaved FASTA

7. May 2018 von Michael Grünstäudl

Quick, de-interleave it!
There are one-liners that never get old. This is one of them.

$ perl -MBio::SeqIO -e 
'my $seqin = Bio::SeqIO->
 new(-fh => \*STDIN, -format => 'fasta');
 while (my $seq = $seqin-> next_seq)
 { print ">", $seq-> id, "\n", $seq-> seq, "\n"; }'
< interleaved.fasta > deinterleaved.fasta

Update 28-Jan-2019:
Over time, I came to find working with BioPerl uncomfortable, as its clean installation is just not well-supported on Linux. Thus, I have found myself relying on this method more and more, assuming that the line breaks of the input file are LF (and not CRLF):

$ INF=interleaved.fasta
awk '/^>/ {printf("\n%s\n",$0);next; } \
{ printf("%s",$0);}  END {printf("\n");}' \
< $INF | tail -n +2 \
> ${INF%.fasta*}_deint.fasta

Kategorie bioinformatics, one-liners | 0 Kommentare »

Few-liner: Batch download of DNA sequences from NCBI

24. April 2018 von Michael Grünstäudl

The wonders of entrez

Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt.

$ cat input.txt
  Liriope_muscari_USACult,JX080424
  Dracaena_adamii_IVORYCOAST,JX080436
  ...

Here is how I did it:

$ INF=input.txt
$ for line in $(cat $INF); do
    SEQNAME=$(echo "$line" | awk -F',' '{print $1}')
    ACCNUM=$(echo "$line" | awk -F',' '{print $2}')
    FULLNAM=$(echo ">${SEQNAME}_${ACCNUM}")
    SEQ=$(esearch -db nucleotide -query "$ACCNUM" | efetch -format fasta | tail -n +2)
    echo -e "$FULLNAM\n$SEQ" >> out.txt
  done

Kategorie bioinformatics, one-liners | 0 Kommentare »

Bioinformatic spring cleaning – Part II

5. April 2018 von Michael Grünstäudl

An improved few-liner to keep the data compressed

If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you:

for file in $(LANG=C find . -size +1G -type f -print); do
    if [[ ! $file == *.gzip ]]; then
    gzip $file
    fi;
done

Kategorie bioinformatics, one-liners | 0 Kommentare »

Bioinformatic spring cleaning – Part I

30. March 2018 von Michael Grünstäudl

A short one-liner to keep the data compressed

One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders.

for i in $(ls -d */); do
    tar czf ${i%%/}.tar.gz $i && rm -r $i;
done

Kategorie bioinformatics, one-liners | 0 Kommentare »

First signs of spring 2018

25. March 2018 von Michael Grünstäudl

Galanthus nivalis (Amaryllidaceae) in Berlin in March 2018

Kategorie audiovisual | 0 Kommentare »

Workshop at the GfBS 2018

14. February 2018 von Michael Grünstäudl

Talking about efficient data partitioning strategies

Today, I held a workshop at the 19th annual meeting of the Gesellschaft für Biologische Systematik (GfBS). I introduced the participants to computational strategies to automate the selection of data partitioning schemes and nucleotide substitution models for phylogenomic datasets.

Workshop at the GfBS 2018

Kategorie Allgemein | 0 Kommentare »

Freie Universität Berlin

Service-Navigation