Michael Grünstäudl (Gruenstaeudl), PhD

Researcher at the Freie Universität Berlin

Archiv der Kategorie 'bioinformatics'

New paper – Bioinformatic Workflows for Generating Complete Plastid Genome Sequences

In science, standardization and repeatability is a must. Together with two other scientists, I just published a new paper on bioinformatic workflows for generating complete plastid genome sequences in the context of plastid phylogenomics of the water-lily clade. We demonstrate that standardization and repeatability are essential elements for modern plant phylogenomics and how such standardization […]

Weiter lesen...

One-liner: Interleaved to deinterleaved FASTA

Quick, de-interleave it! There are one-liners that never get old. This is one of them. $ perl -MBio::SeqIO -e ‚my $seqin = Bio::SeqIO-> new(-fh => \*STDIN, -format => ‚fasta‘); while (my $seq = $seqin-> next_seq) { print „>“, $seq-> id, „\n“, $seq-> seq, „\n“; }‘ < interleaved.fasta > deinterleaved.fasta

Weiter lesen...

Few-liner: Batch download of DNA sequences from NCBI

The wonders of entrez Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt. $ cat input.txt Liriope_muscari_USACult,JX080424 Dracaena_adamii_IVORYCOAST,JX080436 … Here is how I did it: $ INF=input.txt $ for line in $(cat $INF); do SEQNAME=$(echo „$line“ […]

Weiter lesen...

Bioinformatic spring cleaning – Part II

An improved few-liner to keep the data compressed If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you: for file in $(LANG=C find . -size +1G -type f -print); do if [[ ! $file == *.gzip ]]; […]

Weiter lesen...

Bioinformatic spring cleaning – Part I

A short one-liner to keep the data compressed One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders. for i in $(ls -d */); do tar czf ${i%%/}.tar.gz $i && […]

Weiter lesen...

Alignment Phy2Nex few-liner

Alignment file format conversion for the efficient – Part II Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved). In February 2017, I had already written a few-liner to conduct […]

Weiter lesen...

Standardizing tRNA anticodon abbreviations in genome files

Better U than T (or the other way around) Incongruence exists among the annotations of tRNAs in genomic files: Most authors use „U“ (for uracil) in the three-letter anticodon abbreviations of tRNA anntotations, but some use „T“ (for thymine). While I consider the usage of both letters as justified, I believe that this usage should […]

Weiter lesen...

Ordering charsets within NEXUS files

Character sets for the orderly. Defining character sets (charset) in NEXUS files can be an efficient way to annotated specific regions of a DNA or protein alignment. However, many software packages able to write NEXUS files (e.g., BioPython) do not save charsets in an ordered fashion, if multiple ones are present (i.e., that charset at […]

Weiter lesen...

Alignment Nex2Phy few-liner

Alignment file format conversion for the efficient Today, I needed to convert a series of alignments, which were stored in the common nexus format, into newick format. In order to do this efficiently, I wrote the following few-liner. #!/usr/bin/env python2.7 import sys from Bio import AlignIO inFn = sys.argv[1] inp = open(inFn, ‚rU‘) outp = […]

Weiter lesen...

Using rPython to call Python in R

A simple example. Calling Python from within R via the R-package rPython is fairly easy. However, very little documentation exists on this package, and some of the commands may appear quirky at first. Also, don’t confuse rPython with RPython (see capitals)! The paucity of written documentation on this package seems to scare away many biologists, […]

Weiter lesen...