Michael Grünstäudl (Gruenstaeudl), PhD

Quick info parsing from GenBank accessions

Thursday, den 1. November 2018 von Michael Grünstäudl

Taking the essence. Have you ever found yourself browsing through individual sequence records of the NCBI GenBank database and wishing that you could extract only the metadata information of a record (e.g., authors, publication status, taxonomy), but not the feature table of a record or the sequence itself? With the help of Entrez Direct and […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

New paper – Bioinformatic Workflows for Generating Complete Plastid Genome Sequences

Thursday, den 21. June 2018 von Michael Grünstäudl

In science, standardization and repeatability is a must. Together with two other scientists, I just published a new paper on bioinformatic workflows for generating complete plastid genome sequences in the context of plastid phylogenomics of the water-lily clade. We demonstrate that standardization and repeatability are essential elements for modern plant phylogenomics and how such standardization […]

Weiter lesen...

Kategorie bioinformatics, scientific papers | 0 Kommentar »

One-liner: Interleaved to deinterleaved FASTA

Monday, den 7. May 2018 von Michael Grünstäudl

Quick, de-interleave it! There are one-liners that never get old. This is one of them. $ perl -MBio::SeqIO -e ‘my $seqin = Bio::SeqIO-> new(-fh => \*STDIN, -format => ‘fasta’); while (my $seq = $seqin-> next_seq) { print “>”, $seq-> id, “\n”, $seq-> seq, “\n”; }’ < interleaved.fasta > deinterleaved.fasta Update 28-Jan-2019: Over time, I […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Few-liner: Batch download of DNA sequences from NCBI

Tuesday, den 24. April 2018 von Michael Grünstäudl

The wonders of entrez Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt. $ cat input.txt Liriope_muscari_USACult,JX080424 Dracaena_adamii_IVORYCOAST,JX080436 … Here is how I did it: $ INF=input.txt $ for line in $(cat $INF); do SEQNAME=$(echo “$line” […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Bioinformatic spring cleaning – Part II

Thursday, den 5. April 2018 von Michael Grünstäudl

An improved few-liner to keep the data compressed If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you: for file in $(LANG=C find . -size +1G -type f -print); do if [[ ! $file == *.gzip ]]; […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Bioinformatic spring cleaning – Part I

Friday, den 30. March 2018 von Michael Grünstäudl

A short one-liner to keep the data compressed One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders. for i in $(ls -d */); do tar czf ${i%%/}.tar.gz $i && […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Alignment Phy2Nex few-liner

Friday, den 27. October 2017 von Michael Grünstäudl

Alignment file format conversion for the efficient – Part II Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved). In February 2017, I had already written a few-liner to conduct […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Standardizing tRNA anticodon abbreviations in genome files

Friday, den 8. September 2017 von Michael Grünstäudl

Better U than T (or the other way around) Incongruence exists among the annotations of tRNAs in genomic files: Most authors use “U” (for uracil) in the three-letter anticodon abbreviations of tRNA anntotations, but some use “T” (for thymine). While I consider the usage of both letters as justified, I believe that this usage should […]

Weiter lesen...

Kategorie bioinformatics | 0 Kommentar »

Ordering charsets within NEXUS files

Friday, den 5. May 2017 von Michael Grünstäudl

Character sets for the orderly. Defining character sets (charset) in NEXUS files can be an efficient way to annotated specific regions of a DNA or protein alignment. However, many software packages able to write NEXUS files (e.g., BioPython) do not save charsets in an ordered fashion, if multiple ones are present (i.e., that charset at […]

Weiter lesen...

Kategorie bioinformatics | 0 Kommentar »

Alignment Nex2Phy few-liner

Wednesday, den 8. February 2017 von Michael Grünstäudl

Alignment file format conversion for the efficient Today, I needed to convert a series of alignments, which were stored in the common nexus format, into newick format. In order to do this efficiently, I wrote the following few-liner. #!/usr/bin/env python2.7 import sys from Bio import AlignIO inFn = sys.argv[1] inp = open(inFn, ‘rU’) outp = […]

Weiter lesen...

Kategorie bioinformatics | 0 Kommentar »

Freie Universität Berlin

Service-Navigation