Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

Texas mountain laurel in the hill country

Mountains in Texas?

Dermatophyllum secundiflorum (Fabaceae), to most Texans commonly known as the ‘Texas mountain laurel’, is a small, evergreen tree that is planted in various regions of Texas and northern Mexico. Its fragrant purple flowers, which are aggregated into long racemes, and its comparatively high drought tolerance make this plant a popular ornamental plant. Springtime visitors of central Texas (e.g., Austin or San Antonio) have certainly seen this plant in and around city parks and near public buildings. While the reddish seeds are highly poisonous, the flowers have a peculiar smell, which (to me) is reminiscent of grape soda.

Michael Gruenstaeudl inspecting a raceme of Dermatophyllum secundiflorum

Michael Gruenstaeudl inspecting a raceme of Dermatophyllum secundiflorum


Close-up of a raceme of Dermatophyllum secundiflorum

Despite having lived in Texas for many years, I have never found out the origin for this plant’s common name. In fact, ‘Texas mountain laurel’ is a rather unusual name for an area such as central Texas, which may be hilly (and, thus, referred to as the ‘Hill country’) but not exactly blessed with many mountains. Nonetheless, Dermatophyllum secundiflorum is a beautiful – and in spring time fragrant – plant.

Academic freedom is a highly valuable and precious commodity

A good sign as to whether there is free speech is, is someone you don’t like allowed to say something you don’t like? If that is the case, we have free speech.” Elon Musk (2022)

Given recent events, I strongly support initiatives that speak out for freedom of speech in general and academic freedom in particular, such as the Netzwerk Wissenschaftsfreiheit. The phenomenon of ‘Cancel Culture‘ has become a serious problem in academia. It is intimidating faculty, students, and administrative staff alike and unbecoming of a liberal society. My own encounter with ‘Cancel Culture‘ is documented here: https://www.netzwerk-wissenschaftsfreiheit.de/presse/


Visualizing plant collection sites on a map

Image metadata can be both scary and useful

Assume that you are a field botanist on an expedition and that you take photographs of every collection site that you visit. The metadata that is routinely stored as part of your photograph can be used to visualize your collection sites on a map. For example, the following photograph that I took while collecting plants on a bike trip has the following GPS metadata:

Field trip by bike near Neuruppin

Field trip by bike near Neuruppin

GPS Latitude Ref           : North
GPS Longitude Ref          : East
GPS Altitude               : 123 m Above Sea Level
GPS Date/Time              : 2019:06:21 14:57:30Z
GPS Position               : 53 deg 8' 56.52" N, 13 deg 14' 18.04" E

A visualization via picture metadata is quite easy to achieve with R.

First, you use the R package exiftoolr to parse and extract the GPS metadata that may be stored as part of your field trip photographs. Notice that I use the function exif_read() to parse the metadata from all images in the image folder and that I remove all cases where no metadata was extracted from the images via function na.omit().

inData = exif_read(path="/path_to_image_folder/", 
   tags=c("FileName", "GPSDateTime","GPSLatitude", "GPSLongitude"))
inData = as.data.frame(inData)
inData$SourceFile <- NULL
locs = na.omit(inData[,c("GPSLatitude","GPSLongitude")])

Second, you use the R package ggmap to download a suitable map and to plot the latitude and longitude information that you extracted from the image metadata onto that map.

lowerleftlon = min(locs[,2])-6*sd(locs[,2])
lowerleftlat = min(locs[,1])-2*sd(locs[,1])
upperrightlon = max(locs[,2])+6*sd(locs[,2])
upperrightlat = max(locs[,1])+2*sd(locs[,1])
bw_map = get_stamenmap(
    bbox = c(left = lowerleftlon, 
             bottom = lowerleftlat, 
             right = upperrightlon, 
             top = upperrightlat), 
    zoom = 8, 
    maptype = "terrain",
    color = "bw"
ggmap(bw_map) +
  geom_point(data = locs, 
             aes(x = GPSLongitude, y = GPSLatitude),

The above procedure produced the following map for a two-day field trip conducted in 2019:

Map of collection sites

Map of collection sites

The petal color of Texas bluebonnets

Not always blue.

Lupinus texensis (Fabaceae), commonly known as the Texas bluebonnet, is the official state flower (or rather, state plant) of Texas. Given its vernacular name, many people would expect the flower color (or more precisely, the petal color) of Lupinus texensis to be blue. More often than not, the petal color of that beautiful legume is indeed blue.

Texas bluebonnet (Lupinus texensis) – blue variety in foreground

However, populations with mutations in petal color can occasionally be found, such as the one I observed in the city of Temple, TX in March 2022. Here, the petal color was either blueish or pinkish-white. Very likely, the pinkish-white variety of Lupinus texensis resulted from selective breeding and was deliberately planted there or escaped into the wild from a nearby garden.

Texas bluebonnet (Lupinus texensis) – pink variety in foreground


New Paper – Plastid phylogenomic study on South American sunflowers

DNA sequence alignment matters.

Together with a doctoral student and two other co-authors, I just published a new paper on a plant lineage of South American sunflowers. We demonstrate – among other aspects – that a correct DNA sequence alignment remains a critical aspect in plastid phylogenomic studies, even if the amount of sequence data outweighs classic molecular phylogenetic studies by far.

New Paper – A Python package to evaluate archived plastid genomes

Automated database assessment.

Together with an undergraduate co-author, I just published a new paper on a Python package that automates the survey of thousands of plastid genomes archived on NCBI Nucleotide. Using our new software, we demonstrate – among other aspects – that almost half of all plastid genomes archived on NCBI Nucleotide lack sequence annotations for their inverted repeat regions.

Talk and workshop at Botany 2021

Summer time is conference time.

As part of the joint conference (“Botany 2021 – Virtual!“) of several North American botanical and mycological societies, I am giving two scientific presentations: a regular talk in the topic section on comparative genomics and a workshop presentation.

The workshop has been recorded and can be viewed online here (login to the conference website necessary).

Michael Gruenstaeudl at Botany 2021 – Virtual!

Life can be so easy if you speak grep

Habla usted grep?

In the phylogenetic analyses of a manuscript draft, I accidentally named a species with the specific epithet “violaceae” where it should have been “violacea”. I had consistently used the wrong epithet for years, and countless subdirectories of subdirectories of subdirectories now contain analysis files that include the incorrect name.

How can this snafu be rectified in a reasonable time?

Recursive grep to the rescue …

grep -rl violaceae . | xargs sed -i 's/violaceae/violacea/g'

… and the world is fine again.

Buena vista con mVISTA

For an upcoming publication, a doctoral student and I want to visualize the sequence variability among several plastid genomes via the tool mVISTA. This tool is often employed in plastid phylogenomic studies and generally simple to use. However, if a user wishes to input custom annotations, it can be quite tricky to generate the correct input for the software. Hence, I automated the generation of the input files as described below.

An effective visualization of plastid genomes in mVISTA requires the x input genomes the user wants to visualize (“input genomes” hereafter; e.g., x=2) as well as a reference genome. For simplicity, a user could employ x1 as the reference genome. The input genomes should be in FASTA format, the reference genome in GenBank format.

To generate custom annotations based on the reference genome, I employ the following Bash code, which (a) converts the reference genome from GenBank format to a cleaned GFF3 format (and incidentally also saves the genome in FASTA format), and (b) converts the cleaned GFF3 file to the mVISTA input.

INF=NC_000932.gb # Just an example
## Converting input file from GenBank format to a cleaned GFF3 format
# Note: This step also generates a FASTA file
grep -vE "codon_start|db_xref|exception" $INF > ${INF}2
to-gff – getfasta ${INF}2 ${INF%.gb*}.gff
grep "gene=" ${INF%.gb*}.gff | \
    awk -F';' '{print $1}' | \
    grep -vE "remark|intron|misc_feature|repeat_region" > ${INF%.gb*}.gff.clean
rm ${INF}2 ${INF%.gb*}.gff
## Converting input file from GFF3 format to VISTA format
grep -v "^#" ${INF%.gb*}.gff.clean | \
    grep -v "rps12" | \
    awk '{if ($3 ~ /gene/) {print $7" "$4" "$5" "$3" "$9} else {print $7" "$4" "$5" "$3} }' | \
    awk '{if ($4 ~ /gene/) {gsub(/\+/, ">", $1); gsub(/\-/, "<", $1); print $0} else {$1=""; print $0} }' | \
    sed 's/gene=//' | \
    sed 's/gene/agene/' | \
    sort -n -k2 | \
    sed 's/agene/gene/' | \
    awk '{$1=$1}1' | \
    sed 's/CDS/exon/' | \
    sed 's/tRNA/utr/' | \
    sed 's/rRNA/utr/' > ${INF%.gb*}.mvista
mVISTA example

mVISTA example

Quick Illumina read statistics in Bash

Recently, a Master’s student of mine asked me to re-calculate some data statistics of an Illumina sequencing run. Among the desired statistics were (a) the total number of read bases (in bp), (b) the total number of reads, (c) the GC content (in %), and (d) the AT content (in %).

Since the average file size of this run was almost a dozen GB per sample, I wrote some Bash code to can calculate these data statistics within minutes.


#Total read bases (bp):
AllBases=$(grep -A1 "^@" – no-group-separator ${SAMPLE}_*.fastq | \
  grep -v "@" | sed "s|$SAMPLE\_.\.fastq-||g" | wc -m)
echo "$AllBases"

#Total reads:
grep "^@" ${SAMPLE}_*.fastq | wc -l

GandCs=$(grep -A1 "^@" – no-group-separator ${SAMPLE}_*.fastq | \
  grep -v "@" | sed "s|$SAMPLE\_.\.fastq-||g" | tr -cd GC | wc -m)
echo "scale=4;  $GandCs / $AllBases" | bc

AandTs=$(grep -A1 "^@" – no-group-separator ${SAMPLE}_*.fastq | \
  grep -v "@" | sed "s|$SAMPLE\_.\.fastq-||g" | tr -cd AT | wc -m)
echo "scale=4;  $AandTs / $AllBases" | bc