Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

Archiv der Kategorie 'bioinformatics'

Quick info parsing from GenBank accessions

Taking the essence. Have you ever found yourself browsing through individual sequence records of the NCBI GenBank database and wishing that you could extract only the metadata information of a record (e.g., authors, publication status, taxonomy), but not the feature table of a record or the sequence itself? With the help of Entrez Direct and […]

Weiter lesen...

New paper – Bioinformatic Workflows for Generating Complete Plastid Genome Sequences

In science, standardization and repeatability is a must. Together with two other scientists, I just published a new paper on bioinformatic workflows for generating complete plastid genome sequences in the context of plastid phylogenomics of the water-lily clade. We demonstrate that standardization and repeatability are essential elements for modern plant phylogenomics and how such standardization […]

Weiter lesen...

One-liner: Interleaved to deinterleaved FASTA

Quick, de-interleave it! There are one-liners that never get old. This is one of them. $ perl -MBio::SeqIO -e ‘my $seqin = Bio::SeqIO-> new(-fh => \*STDIN, -format => ‘fasta’); while (my $seq = $seqin-> next_seq) { print “>”, $seq-> id, “\n”, $seq-> seq, “\n”; }’ < interleaved.fasta > deinterleaved.fasta   Update 28-Jan-2019: Over time, I […]

Weiter lesen...

Few-liner: Batch download of DNA sequences from NCBI

The wonders of entrez Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt. $ cat input.txt Liriope_muscari_USACult,JX080424 Dracaena_adamii_IVORYCOAST,JX080436 … Here is how I did it: $ INF=input.txt $ for line in $(cat $INF); do SEQNAME=$(echo “$line” […]

Weiter lesen...

Bioinformatic spring cleaning – Part II

An improved few-liner to keep the data compressed If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you: for file in $(LANG=C find . -size +1G -type f -print); do if [[ ! $file == *.gzip ]]; […]

Weiter lesen...

Bioinformatic spring cleaning – Part I

A short one-liner to keep the data compressed One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders. for i in $(ls -d */); do tar czf ${i%%/}.tar.gz $i && […]

Weiter lesen...

Alignment Phy2Nex few-liner

Alignment file format conversion for the efficient – Part II Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved). In February 2017, I had already written a few-liner to conduct […]

Weiter lesen...

Standardizing tRNA anticodon abbreviations in genome files

Better U than T (or the other way around) Incongruence exists among the annotations of tRNAs in genomic files: Most authors use “U” (for uracil) in the three-letter anticodon abbreviations of tRNA anntotations, but some use “T” (for thymine). While I consider the usage of both letters as justified, I believe that this usage should […]

Weiter lesen...

Ordering charsets within NEXUS files

Character sets for the orderly. Defining character sets (charset) in NEXUS files can be an efficient way to annotated specific regions of a DNA or protein alignment. However, many software packages able to write NEXUS files (e.g., BioPython) do not save charsets in an ordered fashion, if multiple ones are present (i.e., that charset at […]

Weiter lesen...

Alignment Nex2Phy few-liner

Alignment file format conversion for the efficient Today, I needed to convert a series of alignments, which were stored in the common nexus format, into newick format. In order to do this efficiently, I wrote the following few-liner. #!/usr/bin/env python2.7 import sys from Bio import AlignIO inFn = sys.argv[1] inp = open(inFn, ‘rU’) outp = […]

Weiter lesen...