Michael Grünstäudl (Gruenstaeudl), PhD

Life can be so easy if you speak grep

Thursday, den 4. March 2021 von Michael Grünstäudl

Habla usted grep? In the phylogenetic analyses of a manuscript draft, I accidentally named a species with the specific epithet “violaceae” where it should have been “violacea”. I had consistently used the wrong epithet for years, and countless subdirectories of subdirectories of subdirectories now contain analysis files that include the incorrect name. How can this […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Quick Illumina read statistics in Bash

Tuesday, den 6. October 2020 von Michael Grünstäudl

Recently, a Master’s student of mine asked me to re-calculate some data statistics of an Illumina sequencing run. Among the desired statistics were (a) the total number of read bases (in bp), (b) the total number of reads, (c) the GC content (in %), and (d) the AT content (in %). Since the average file […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Automatically renaming contigs of assembly results

Wednesday, den 26. August 2020 von Michael Grünstäudl

The genome assembly process often generates FASTA-formatted contig files, in which the contigs have cryptic sequence names. By using specific Bash commands, one can automatically rename these contigs based on the name of the file they are contained in. If your contig file contains only a single contig: for i in *__contig.fasta; do VAR=${i%__contig.fasta*}; sed […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Setting burn-in and combining posterior tree distributions using awk and sed

Thursday, den 7. February 2019 von Michael Grünstäudl

Efficiency on the UNIX shell I often find myself manually removing a set of phylogenetic trees from a posterior tree distribution in order to set a burn-in and then combining the post-burnin trees of the individual runs. This action can be done very efficiently using awk on a UNIX shell: inf1=Mrbayes_test.run1.t inf2=Mrbayes_test.run2.t tmpf1=${inf1%.t*}_postBurnin.tre tmpf2=${inf2%.t*}_postBurnin.tre outf=${inf1%.run1.t*}_combined_postBurnin.tre […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Quick info parsing from GenBank accessions

Thursday, den 1. November 2018 von Michael Grünstäudl

Taking the essence. Have you ever found yourself browsing through individual sequence records of the NCBI GenBank database and wishing that you could extract only the metadata information of a record (e.g., authors, publication status, taxonomy), but not the feature table of a record or the sequence itself? With the help of Entrez Direct and […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

One-liner: Interleaved to deinterleaved FASTA

Monday, den 7. May 2018 von Michael Grünstäudl

Quick, de-interleave it! There are one-liners that never get old. This is one of them. $ perl -MBio::SeqIO -e ‘my $seqin = Bio::SeqIO-> new(-fh => \*STDIN, -format => ‘fasta’); while (my $seq = $seqin-> next_seq) { print “>”, $seq-> id, “\n”, $seq-> seq, “\n”; }’ < interleaved.fasta > deinterleaved.fasta Update 28-Jan-2019: Over time, I […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Few-liner: Batch download of DNA sequences from NCBI

Tuesday, den 24. April 2018 von Michael Grünstäudl

The wonders of entrez Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt. $ cat input.txt Liriope_muscari_USACult,JX080424 Dracaena_adamii_IVORYCOAST,JX080436 … Here is how I did it: $ INF=input.txt $ for line in $(cat $INF); do SEQNAME=$(echo “$line” […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Bioinformatic spring cleaning – Part II

Thursday, den 5. April 2018 von Michael Grünstäudl

An improved few-liner to keep the data compressed If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you: for file in $(LANG=C find . -size +1G -type f -print); do if [[ ! $file == *.gzip ]]; […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Bioinformatic spring cleaning – Part I

Friday, den 30. March 2018 von Michael Grünstäudl

A short one-liner to keep the data compressed One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders. for i in $(ls -d */); do tar czf ${i%%/}.tar.gz $i && […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Alignment Phy2Nex few-liner

Friday, den 27. October 2017 von Michael Grünstäudl

Alignment file format conversion for the efficient – Part II Today, I needed to convert a series of alignments, which were stored in the phylip format, into the common nexus format. The output DNA alignment hereby needed to be of sequential format (i.e., non-interleaved). In February 2017, I had already written a few-liner to conduct […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Freie Universität Berlin

Service-Navigation