Alumnus of Freie Universität Berlin – Michael Grünstäudl, PhD

Visualizing plant collection sites on a map

Monday, den 11. April 2022 von Michael Grünstäudl

Image metadata can be both scary and useful Assume that you are a field botanist on an expedition and that you take photographs of every collection site that you visit. The metadata that is routinely stored as part of your photograph can be used to visualize your collection sites on a map. For example, the […]

Weiter lesen...

Kategorie audiovisual, bioinformatics | 0 Kommentar »

Life can be so easy if you speak grep

Thursday, den 4. March 2021 von Michael Grünstäudl

Habla usted grep? In the phylogenetic analyses of a manuscript draft, I accidentally named a species with the specific epithet “violaceae” where it should have been “violacea”. I had consistently used the wrong epithet for years, and countless subdirectories of subdirectories of subdirectories now contain analysis files that include the incorrect name. How can this […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Buena vista con mVISTA

Monday, den 12. October 2020 von Michael Grünstäudl

For an upcoming publication, a doctoral student and I want to visualize the sequence variability among several plastid genomes via the tool mVISTA. This tool is often employed in plastid phylogenomic studies and generally simple to use. However, if a user wishes to input custom annotations, it can be quite tricky to generate the correct […]

Weiter lesen...

Kategorie bioinformatics | 7 Kommentare »

Quick Illumina read statistics in Bash

Tuesday, den 6. October 2020 von Michael Grünstäudl

Recently, a Master’s student of mine asked me to re-calculate some data statistics of an Illumina sequencing run. Among the desired statistics were (a) the total number of read bases (in bp), (b) the total number of reads, (c) the GC content (in %), and (d) the AT content (in %). Since the average file […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Automatically renaming contigs of assembly results

Wednesday, den 26. August 2020 von Michael Grünstäudl

The genome assembly process often generates FASTA-formatted contig files, in which the contigs have cryptic sequence names. By using specific Bash commands, one can automatically rename these contigs based on the name of the file they are contained in. If your contig file contains only a single contig: for i in *__contig.fasta; do VAR=${i%__contig.fasta*}; sed […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Massive plastid genome sequencing coincides with incomplete annotations of the inverted repeats

Thursday, den 21. November 2019 von Michael Grünstäudl

High numbers do not equate with high quality Over the past 10 years, the number of complete plastid genome sequences available on NCBI GenBank has skyrocketed, especially for flowering plants. Whereas in December 2009, approximately 120 complete plastid genomes of flowering plants were present on GenBank, there are 5,838 complete plastid genomes of flowering plants available […]

Weiter lesen...

Kategorie bioinformatics | 0 Kommentar »

Improved sorting of numbered DNA sequences

Thursday, den 12. September 2019 von Michael Grünstäudl

Keeping things orderly Like many other molecular phylogeneticists, I often work with massive numbers of FASTA-formatted DNA sequences. Occasionally, the names of these sequences are numbered in a simplistic fashion (i.e., 1,2,…,48,49), which has the unfortunate side-effect of messing up the intended order of the sequences when sorted numerically, as the sequences 10, 11, …, […]

Weiter lesen...

Kategorie bioinformatics | 0 Kommentar »

On the issue of file formats during DNA sequence submissions to GenBank

Tuesday, den 20. August 2019 von Michael Grünstäudl

Ramblings on an important topic A series of software tools exist that allow users to conduct submissions of DNA sequences to NCBI GenBank, but file conversion represents a recurring challenge for those submissions. Similar to DNA sequence submissions to ENA, GenBank provides a wide range of options to upload annotated DNA sequences in a custom […]

Weiter lesen...

Kategorie bioinformatics, writing | 0 Kommentar »

Extracting mapped R1 and R2 reads from SAM file

Wednesday, den 5. June 2019 von Michael Grünstäudl

Extracting those that mapped Recently, I found myself in need of extracting only those reads from a sequence alignment map (SAM) file that actually mapped to the reference genome, while maintaining the separation into the paired-end read design. By using a combination of samtools, bedtools and awk, this can be done very efficiently: INFL=MySamples.sam STEM=${INF%.sam*} […]

Weiter lesen...

Kategorie bioinformatics | 0 Kommentar »

Setting burn-in and combining posterior tree distributions using awk and sed

Thursday, den 7. February 2019 von Michael Grünstäudl

Efficiency on the UNIX shell I often find myself manually removing a set of phylogenetic trees from a posterior tree distribution in order to set a burn-in and then combining the post-burnin trees of the individual runs. This action can be done very efficiently using awk on a UNIX shell: inf1=Mrbayes_test.run1.t inf2=Mrbayes_test.run2.t tmpf1=${inf1%.t*}_postBurnin.tre tmpf2=${inf2%.t*}_postBurnin.tre outf=${inf1%.run1.t*}_combined_postBurnin.tre […]

Weiter lesen...

Kategorie bioinformatics, one-liners | 0 Kommentar »

Freie Universität Berlin

Service-Navigation