Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

Massive plastid genome sequencing coincides with incomplete annotations of the inverted repeats

High numbers do not equate with high quality

Over the past 10 years, the number of complete plastid genome sequences available on NCBI GenBank has skyrocketed, especially for flowering plants. Whereas in December 2009, approximately 120 complete plastid genomes of flowering plants were present on GenBank, there are 5,838 complete plastid genomes of flowering plants available at the end of November 2019. That is an increase of 4,765% in 10 years (or roughly a quinquaginta-nupling of all plastid genomes available since 2009 [yes, I had Latin in school]).

However, it appears that the annotation quality of many of the genomes submitted to GenBank over the past decade remained less than stellar during this increase. The inverted repeat (IR) regions of a plastid genome are integral parts of virtually any flowering plant plastome, and their presence in the sequence annotations are typically a good proxy for the annotation quality. While in December 2009, approximately 95 (or approx. 80%) of the available plastid genomes of flowering plants did not contain unambiguous annotations for the IRs, that number has increased to 3,190 (or approx. 55%) at the end of November 2019.

Example of IR annotation (highlighted in grey) in GenBank flat file of a complete plastid genome

In short, despite a dramatic increase in complete plastid genomes of flowering plants available on GenBank, more than half of all those genomes still do not have appropriate sequence annotations of the IR regions of the plastid genome. This is quite concerning, not least because IR annotations are very simple to include in a genome file (see figure). Clearly, improved strategies for annotating (or correcting) complete plastid genomes are necessary.

 

 

Der Beitrag wurde am Thursday, den 21. November 2019 um 13:30 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Captcha
Refresh
Hilfe
Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.